dataproofer / Dataproofer

A proofreader for your data
691 stars 53 forks source link

Test: Name consistency #17

Open newsroomdev opened 8 years ago

newsroomdev commented 8 years ago

Please read how to create a new test if you're interested in writing this test.

Does your data have Middle Eastern or East Asian names in it? Are you sure the surnames are always in the same place? Is it possible anyone in your dataset uses a mononym? These are the sorts of things that data creators habitually get wrong. If you're working with a list of ethnically diverse names—which is any list of names—then you should do at least a cursory review before assuming that joining the first_name and last_name columns will give you something that is appropriate to publish. -Quartz Bad Data Guide

If a column is designated automatically or by the user as a name column, provide a brief description of why missing cells in a name column is potentially bad.

ejfox commented 8 years ago

I think that this one might be hard to automate- @geraldarthur you okay to kill for now?

newsroomdev commented 8 years ago

Let's save this one for a rainy day or maybe a hackathon. It's an interesting data smell that's definitely testable, but not one we have the time to really work on. Happy to take pull requests from anyone on this.