Open jrspringman opened 1 month ago
I've created a data_update.R that adds the civic space counts data. I did this by piggybacking off the ML4P-Civic-Space-Forecasting
infrastructure. So updating this data will require first running update data for that and then running update_data.R for this.
I pulled-in the Shock detection data by hand. We need to add something to the data_update.R
that will pull this from the shock repo automatically (or add something to the shock repo that publishes shock data to our collective dropbox folder and then pull from there.).
We need the following data, as well as scripts that pull this data from the respective folders/repositories. With the exception of the shock detection results, this should all be in the final, processed data that is ingested by
forecast-surges-pipeline
. We would just need to delete the TE data.For the civic space and RAI data, there's an easy approach and a harder approach.
Civic Space Data
Easy Approach An adaptation of this script should work:
It includes raw, normalized, article total, and source entry flags. It also includes the TE data, which needs to be removed. So you'll need to figure out a simple way to scrub that from the country-datasets as you bind them together. The only thing you cannot get from this is RAI variables that disaggregate each specific indicator for Russia/China.
Harder but better approach
Adding a few lines to the
ML4P-Civic-Space-Forecasting
processing would be better. You could just find the part of the code that writes-out to theml4p.forecasting/2-model-data
dropbox folder and write out a slightly different version without the TE data.Shock Detection Results
This data is stored in data subfolders within
forecast-surges-pipeline/data
. I think the best method will be to add a line of code in the python script that outputs to a subfolder in theml4p.forecasting
dropbox folder. This way, you can avoid the dated subfolders, and just pull from there.Disaggregate by domestic/regional+international
Do you have code that does this? For my Remedios project, I have a repo that takes a modified version of the core functions for the
ml4p.forecast
package and writes-out source-level data. That might be a helpful starting point. Let me know if you want me to add you to the repo. You'll be looking atcode/sample_frame.R
andcode/mlp_functions.R
RAI Data
At some point, I need to modify the
rai.atari
package to output the disaggregated Russia/China results.