etalab / notebooks

🤓 📓 📊
4 stars 4 forks source link

Data hacks: clean puiss_max and code_insee #6

Closed AntoineAugusti closed 4 years ago

AntoineAugusti commented 4 years ago

A common error made by producers is badly formatted data. There are already a handful of hacks in place to clean the data. After looking at a failed validation report, I fixed more common mistakes:

Disclaimer: I didn't run the script locally, you may want to do this after reviewing the changes.

abulte commented 4 years ago

The script crashes ;-) Those columns are not always defined, cf what I've done for lat long:

    unique['Xlongitude'] = unique['Xlongitude'].replace(',', '.') if unique['Xlongitude'] else ''
AntoineAugusti commented 4 years ago

Oh so, these conditions matter! Right, updating now.

abulte commented 4 years ago