RMI-PACTA / workflow.data.preparation

The goal of `workflow.data.preparation` is to prepare all of the necessary data inputs for the Transition Monitor web application.
Other
2 stars 0 forks source link

avoid unnecessary intermediate scenario files #124

Closed cjyetman closed 8 months ago

cjyetman commented 8 months ago

I believe we originally created these scenario CSVs to faithfully replicate the setup for the former data.prep process, but as far as I know these CSVs are not useful for anything once the current data.prep is completely. In this PR, I move that process further down and rather than export those intermediate files just process the in memory data objects.

@jdhoffa hope you can review this thoroughly and agree

jdhoffa commented 8 months ago

This is currently the case, however given the parallel discussion about superseding pacta.scenario.preparation with the public pacta.scenario.data.preparation, I would really prefer if you didn't do this.

I will just need to re-introduce the .csv file again later to have an appropriate API for the prepared scenario data.

I would recommend closing this PR in favour of https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/5

cjyetman commented 8 months ago

While I would be happy to see https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/5 completed and leveraged here, I don't see why these are mutually exclusive. This PR doesn't change the process really, just when (later in the workflow) and where (in-memory rather than writing a temporary file) it happens. Do you see something in this PR that would prevent you from doing https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/5 in the future?

jdhoffa commented 8 months ago

Because doing things in memory would remove my entry-point into the workflow. I aim to prepare the scenario files in workflow.scenario.preparation and read the scenario files from disk.

Anyway not a huge issue, do as you wish. I will just need to re-introduce reading the .csv again relatively soon.

cjyetman commented 8 months ago

Because doing things in memory would remove my entry-point into the workflow. Not a huge issue, I will just need to re-introduce reading the .csv again relatively soon.

In that case, I would actually strongly prefer merging this, because https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/5 will require a refactoring/removal of the scenario processing code in https://github.com/RMI-PACTA/workflow.data.preparation/blob/main/run_pacta_data_preparation.R (which will be somewhat easier after this condenses it into a single block) and it will likely require adding a new file path to the config file etc.

e.g. it sounds like https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/5 will change the API of workflow.data.preparation which is a much larger change, whereas this PR simply rearranges and optimizes things a bit