RMI-PACTA / pacta.data.preparation

The goal of {pacta.data.preparation} is to prepare and format all input datasets required to run the PACTA for investors tools.
https://rmi-pacta.github.io/pacta.data.preparation/
Other
1 stars 0 forks source link

consider removing rows where scenario data is not available #10

Open cjyetman opened 1 year ago

cjyetman commented 1 year ago

https://github.com/RMI-PACTA/pacta.data.preparation/blob/ba0f8b8518afb2d00bfe5d9bff1a935418eaa5dd/R/dataprep_abcd_scen_connection.R#L267-L303

When the scenario data is left_joined with the ABCD data, it's possible/likely that some rows of the ABCD data will not match any rows in the scenario data by = c("scenario_geography", "year", "ald_sector", "technology"), and therefore the columns from the scenario data that are added (scenario_source, scenario, units, direction, fair_share_perc) will be filled with NA for those rows. Are these rows useful at all after this point?

I think we should carefully consider whether these lines with no scenario data are meaningful for any reason, and if not we should filter them out to potentially reduce the size of the data substantially. @jacobvjk @jdhoffa @AlexAxthelm

It's possible we do want at least one row of the ABCD data to be left in place even if no scenario data matches it, in which case we'll need something more sophisticated... though the scenario_geography and equity_market columns will make multiple rows distinct even while the rest of the data is duplicated?

related RMI-PACTA/pacta.data.preparation#7