Open umbe1987 opened 1 year ago
Hi @umbe1987 - executor.py is aimed at standardising the datasets from various sources, i.e. generating the data found in processed.
The main reason to run this would be to (a) update the source data, e.g. more recent datasets, or (b) include additional datasets. If you are updating the source data, you need to provide the raw datasets yourself. If you are including additional data, you can pass your own configuration, similar to configuration.json.
The source/raw data files have not been packaged with the application.
Thanks for your feedback @rahulnair23
My aim was to update the datasets. Let me please try to see if I understood correctly.
This project does not allow me to download new data, but to standardise new raw data, is it correct?
If this is the case, is there a document with the various links to download the source (raw) files somewhere (those that were used to generated the ones in the processed folder that I would need to update)?
Also, I guess I am supposed to create a folder prm-datasets/indicators and place there the various raw data with the exact names I see in configuration.json, is that correct?
Thanks in advance for clarifying.
(with apologies for the late response).
This project does not allow me to download new data, but to standardise new raw data, is it correct?
The reason is that we can't redistribute the source data without adequate permissions. While most datasets are openly available (e.g. UNHCR, WHO, Worldbank), some others are not (e.g. EMDAT).
If this is the case, is there a document with the various links to download the source (raw) files somewhere (those that were used to generated the ones in the processed folder that I would need to update)?
Have a look at configuration.json which is probably the most descriptive of the (raw) sources. We do not have direct URL links unfortunately. Data publishing in this sector typically is ad hoc and may not be consistent year on year (with exceptions).
Also, I guess I am supposed to create a folder prm-datasets/indicators and place there the various raw data with the exact names I see in configuration.json, is that correct?
Yes, you can edit the configuration to remove any sources you don't find/want. Or the base path of the source files. You can include additional sources and specify particular any custom transformers as well.
https://github.com/IBM/mixed-migration-forecasting/blob/5047c748b60b3f7c3621e0174200007865cc2933/server/executor.py#L42
Hello and thanks for sharing this repository first of all :)
This line of code in
executor.py
throws an error since it tries to read a file that does not exist (prm-datasets/indicators/worldbank/WDI/WDIData.csv'
).This is set as the source file, is it correct? The folder
prm-datasets/indicators
does not exist in the project when you download it, whereasprm-datasets/processed
folder does. Maybe it should be the other way round (just a guess)?