anthill / open-moulinette

Scripts to clean Open-Data.
MIT License
40 stars 20 forks source link

Missing IRIS #15

Closed gowithefloww closed 9 years ago

gowithefloww commented 9 years ago

Hi there,

I'm working on Oise department (60) and I realised that 19 iris (eq 7 communes) are missing from the output. I compared with the 2013 ' reference table (here : http://www.insee.fr/fr/methodes/default.asp?page=zonages/iris.htm), excluding the case 'Iris = commune'.

Here is the list :

missing_iris

Any idea of where it comes from ? I could not figure it out.

Many thanks

armgilles commented 9 years ago

Hi FloSugar,

Data comes from ign and are based on population censur 2011.

"ZZZZZ" mean "Mise à jour de code IRIS suite à un événement antérieur".

capture d ecran 2015-06-02 a 18 15 07

Sadly there is no connection Iris 2011 to Iris 2013...

hope it helps !

armgilles commented 9 years ago

FloSugar, I looked deeper in the data.

I found 799 iris for DEP = '60' in our file iris.json from insee-iris.

I've loaded Table de référence des IRIS 2013 from your link and found 799 iris for DEP = '60'.

Looking for "DCOMIRIS" , I found it in:

capture d ecran 2015-06-02 a 19 16 37

gowithefloww commented 9 years ago

Thank you for your reply!

Right, they are on the json, but can you still see them after running the Makefile ? Unless i'm mistaken, but it seems that they disappear during the aggregation of all the iris features (see the final output). They might be dropped somewhere but I could not find where and why.

armgilles commented 9 years ago

Ok i have something,

it seems there are iris missing compared to non census files ('commerce', 'sport' etc...). In Insee's documentation I found that "Seuls les IRIS proposant au moins un équipement sont retenus." (in .xls sheet "Doc_géographique").

So I have only 118 iris for non census files.

But if we look for your iris (19), you can find it in census files ('population', 'activite' etc...)

capture d ecran 2015-06-02 a 21 05 26

In fact they are more than 50k iris for census files. But when we merged it, we lost all geo info (DEP, LIBCOM...). But you could find them with iris's key as i did above.

Have to fix it ! :)

gowithefloww commented 9 years ago

For some reasons the census files were missing from my directory. Now that I updated it, I can indeed find the missing IRIS in the census files : you're right, we lost them after the merge.

gowithefloww commented 9 years ago

Ok I found the problem.

First you merge all the 'non-census' files which don't contains the "missing_iris", the key is CODGEO.

When it comes to 'census' data (which contains all the iris), you merge with the same key but still drop all the secondary keys (DEP, COM...) by doing :

features = [x for x in header if x not in ['IRIS','REG','DEP','UU2010','COM','LIBCOM','TRIRIS',
                                           'GRD_QUART','LIBIRIS','TYP_IRIS', 'MODIF_IRIS', 'LAB_IRIS']]
features.append('CODGEO')
data = pd.merge(data, population[features], on='CODGEO', how='outer')

But since theses CODGEO are not in the previous data table, this explains why you lose all the geo info :-)

armgilles commented 9 years ago

Yeah thanks !

Fix should pop in the day :)

armgilles commented 9 years ago

Should be fix #16 now.

By the way, they are now 811 iris for Oise, 12 of them are "ZZZZ" (example : 60057ZZZZ).

Feel free to reopen if there is a problem :)