Open FranckCo opened 2 years ago
Mail from Paolo confirms that it's not possible to automate the process for Census data
OK, but we still can do better than mail: have a documented manual procedure to produce the data and a fixed URL where they can be obtained.
OK, but we still can do better than mail: have a documented manual procedure to produce the data and a fixed URL where they can be obtained.
Here's the permanent census data source You can browse the data manually, customize your query and export data using the provided toolbar. I also found a conctat page where it's possible to request data on demand, but it's still a manual task.
I would like if any data provider had a direct connection to datasets, but the reality is that any data provider out there has their own quirks and habits in their own data publication. I think it's good to build a protocol with specifications on how to GET data from data providers or let them POST data in our repository. Hera are some examples: 1) Auto-browsing: Whenever data are presented with an interface easy to browse, just like simple HTML or some other easy to browse format so that data extraction can be made automatically with a spider (just like google's) 2) Human-Interaction: Istat census is well made and all, but, alas it's made for HUMAN interaction. This is another case of study: Has it any meaning to devise a spider to browse such interfaces automatically? I don't even know if there actually are such devices at all. 3) Data provider initiative: We could devise a module in our system to ease data source referrals officers to post data on our site in a standardized approach. But we can never expect complete M2M compliance from data providers, at least for the foreseeable future.
I repeat, the procedure should (even if it is manual):
Unfortunately, the interface is not machine readable. It cannot generate an url to get the file. So no.
The only solution is try to get machine readable data from another source. What about the SEP from Eurostat?
----- Messaggio originale ----- Da: "Franck Cotton" @.> A: "INTERSTAT/Statistics-Contextualized" @.> Cc: "Paolo Francescangeli" @.>, "Comment" @.> Inviato: Venerdì, 12 novembre 2021 13:49:35 Oggetto: Re: [INTERSTAT/Statistics-Contextualized] SEP data workflow: Italian census data (Issue #9)
I repeat, the procedure should (even if it is manual):
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://urlsand.esvalabs.com/?u=https%3A%2F%2Fgithub.com%2FINTERSTAT%2FStatistics-Contextualized%2Fissues%2F9%23issuecomment-967095627&e=17c5563b&h=4b978cb1&f=n&p=y
Italian census data retrieval: main steps and workflow Census Data extraction Step 1: Download from Istat source website Step2: browse CENSUS OF POPULATION AND HOUSING from the left toolbar Step3: Select Population/Demographic characteristics and citizenship/Age structure – municipalities Step4: Select Customise/Table options from the top toolbar and a pop-up window appears Step5: Select from panel Dimension Member Labels/All dimensions/Use codes and then View data Step6: Select Customise/selection/Select time from the top toolbar and a pop-up window appears Step7: Select date range (2018-2018) and then View data Step8: Select Export/csv from the top toolbar and then select Download from the pop-up window The downloaded file is not compliant with the required DSD.
Data transformation The downloaded file has the following Data Structure: ITTER107,"Territory","TIPO_DATO_CENS_POP","Datatype","SEXISTAT1","Gender","ETA1","Age class","TIME","Select time","Value","Flag Codes","Flags"
Data Load The transformed file was uploaded into INTERSTAT GraphDB. GraphDB allows direct link to the resources by a GET permalink , but the raw data needs a little reworking to be accessed directly. It can be downloaded rewriting the POST URL using the ID in the permalink
Transformation script in R language Pilot A - census data processing.txt
The source file from Italian Census has been uploaded in the FTP area of the project. As requested, metadata files for conversion from ISTAT territorial codes to LAU and NUTS3 has been uploaded to GitHub. In addition to the metadata, the Italian NUT3 has been uploaded to GitHub as well.
Census data pipeline for Italy now fully implemented (f94cf3b8a84f22adb68152e1832ddf02feeec4dd), except conversion to NGSI-LD.
Hi, we were reviewing census output contained in the ftp repository and noticed several details that don't seem to add up. Can you check it up, please?
1) It seems that Age class didn't translate well with Italian data. It always says "Y_LT_5" or "Y_UN4" for all IT rows 2) French population seems a float while Italian is integer. Is such mismatch correct?
Italian census data is currently produced manually. Explore possibilities of automation.