MapofLife / vernacular-names

A framework for managing vernacular names in a database, along with scripts to make sense of the data
0 stars 1 forks source link

Titlecase messes up Spanish names containing 'ñ' #98

Open gaurav opened 9 years ago

gaurav commented 9 years ago

For example, "Espada SureñA" -> http://nomdb.map-of-life.appspot.com/taxonomy/names/search?search=Xiphophorus+maculatus&dataset=#lang-es

gaurav commented 9 years ago

Titlecase is splitting words correctly (i.e. "Espada", "sureña"), but then gets confused when trying to capitalize the text, but I'm not sure why. I suspect it has something to do with the [a-zA-Z] all over the place, but Python doesn't support this in built-in.

I'm still trying to figure out if there's a fix we can submit to titlecase that'll sort this out for everyone, but our other options are:

  1. Fork titlecase to use regex.
  2. Use titlecase's callback function to detect strings with non-ASCII characters in them, and sort them out at our end. This means we'd lose all the fancy in-string-punctuation stuff that's
  3. Somehow fix the strings after they've returned from titlecase: maybe look for unicode, then for "[:lower:][:upper:]" and fix them using regex.
gaurav commented 9 years ago

This doesn't affect all characters: http://localhost:9001/taxonomy/names/search?lookup=Centropomus%20nigrescens&search=Centropomus%20nigrescens&open_lang=fr -- I'm going to file it as a bug on titlecase and see what happens.