Open davidbgk opened 6 years ago
Hi,
The same problem arises with the towns name in geozones-france-2019-0-json.tar.xz
.
I corrected my imported data with:
def utf8_recode(value: str, src: str = "latin-1") -> Optional[str]:
"""
>>> utf8_recode('Château-Thierry')
'Château-Thierry'
>>> utf8_recode('Château-Thierry')
'Château-Thierry'
>>> utf8_recode(None)
"""
if value in (None, ""):
return None
if "\xc3" in value:
return value.encode(src).decode("utf-8")
return value
See http://www.data.gouv.fr/fr/datasets/geozones/#discussion-5a4e3ae6c751df376c39672a for details.
Probably a missing conversion in
extract_french_district
with'wikipedia': props['wikipedia'],
proposal: