MapofLife / vernacular-names

A framework for managing vernacular names in a database, along with scripts to make sense of the data
0 stars 1 forks source link

Standardize case #2

Open gaurav opened 10 years ago

gaurav commented 10 years ago

Every common name should have case standardized. This may need to be fixed manually.

gaurav commented 10 years ago

Additional complexities: names like "Coqueiro da Bahia", where we don't want to touch capitalization. But there's no real way to differentiate it from "Common Spadefoot". Maybe if we just keep adding sources the "best" name will bubble to the top? Or maybe we can select names with a minimum of uppercase letters?

gaurav commented 10 years ago

Another possible solution: check for title case (every word starts with a capital letter) and lowercase those. This still causes "Northern european toad" but avoids messing up "Coqueiro da Bahia".

gaurav commented 10 years ago

Do it by dataset? (If some already case it in a particular way)

gaurav commented 10 years ago
gaurav commented 10 years ago

The "gadm*" tables on CartoDB have administrative divisions in English for a bunch of names, which we could use to identify country names.

gaurav commented 9 years ago

The current plan is to title-case everything, then fix anything that shouldn't be title-cased manually (#47).

gaurav commented 9 years ago

We should be able to get some ongoing statistics on this once #29 is done.