Closed Asphahyre closed 7 years ago
Need to update mk_data.py
script.
Work in progress.
Please do not merge for now.
I took the liberty of renaming the original CODGEO
, LIBGEO
and LIBCOM
columns respectively by IRIS
, LIB_IRIS
and LIB_COM
, since the data/equip-*
files contains those columns as is, and it's more explicit.
Use of this line to remove duplicates iris entries (in case of change of iris, or renaming of iris) :
data.drop_duplicates(subset='IRIS', keep='first', inplace=True)
I didn't understand how that worked before without this line, since I needed it, but that may be done a way cleaner.
Thanks to @armgilles for his explanations on the mk_data.py
.
Using this PR to give some details.
Do not use data.drop_duplicates(subset='IRIS', keep='first', inplace=True)
, we lost data by this way.
The problem comes from Census files and the others files.
The key to merge files between them was (originally) :
key = ['IRIS', 'LIB_IRIS', 'COM', 'LIB_COM', 'REG', 'DEP']
But we have duplicate IRIS :
Using a new key for merging like ['IRIS', 'REG', 'DEP']
We merge this lines with no lost of information (right or left)
Doing automatic check for :
Corrected dead links Added datas from previous years when avalaible