Open gaurav opened 10 years ago
Additional complexities: names like "Coqueiro da Bahia", where we don't want to touch capitalization. But there's no real way to differentiate it from "Common Spadefoot". Maybe if we just keep adding sources the "best" name will bubble to the top? Or maybe we can select names with a minimum of uppercase letters?
Another possible solution: check for title case (every word starts with a capital letter) and lowercase those. This still causes "Northern european toad" but avoids messing up "Coqueiro da Bahia".
Do it by dataset? (If some already case it in a particular way)
The "gadm*" tables on CartoDB have administrative divisions in English for a bunch of names, which we could use to identify country names.
The current plan is to title-case everything, then fix anything that shouldn't be title-cased manually (#47).
We should be able to get some ongoing statistics on this once #29 is done.
Every common name should have case standardized. This may need to be fixed manually.