RMI-PACTA / pacta.data.preparation

The goal of {pacta.data.preparation} is to prepare and format all input datasets required to run the PACTA for investors tools.
https://rmi-pacta.github.io/pacta.data.preparation/
Other
1 stars 0 forks source link

create a list of important characteristics to be auto checked after data.prep runs #6

Open cjyetman opened 1 year ago

cjyetman commented 1 year ago

It would be nice to start compiling a list of data characteristics that should be checked once data.prep is done, and eventually write some code that will automatically check/report on them after every run.

I'll start with a few, and please add to it if you think of any...

cjyetman commented 1 year ago

case in point RMI-PACTA/archive.pacta.data.preparation#210

cjyetman commented 5 months ago

I think this should be transferred to workflow.transition.monitor? @jdhoffa @AlexAxthelm

AlexAxthelm commented 5 months ago

I would say its better in https://github.com/RMI-PACTA/workflow.pacta.data.qa

jdhoffa commented 5 months ago

Based on the actual check list that @cjyetman wrote, I think I agree that it should live in https://github.com/RMI-PACTA/workflow.pacta.data.qa

Maybe we need to discuss and focus our overall QA strategy a bit?

My first proposal would be that:

cjyetman commented 5 months ago

I don't like the idea of workflow.transition.monitor silently exporting bogus data, so I intend to implement some version of this there. Not opposed to similar checks living in the QA repo.

jdhoffa commented 5 months ago

Yeah totally fair.

I think mainly the point:

seems like it's not really appropriate for https://github.com/RMI-PACTA/workflow.transition.monitor

A priori, how should https://github.com/RMI-PACTA/workflow.transition.monitor know how many funds should be expected?

But the other basic self-consistency checks totally make sense.

Maybe just close this issue, and make two more focused issues in https://github.com/RMI-PACTA/workflow.transition.monitor and https://github.com/RMI-PACTA/workflow.pacta.data.qa

cjyetman commented 5 months ago
  • [ ] number of unique funds in the fund database

seems like it's not really appropriate for https://github.com/RMI-PACTA/workflow.transition.monitor

A priori, how should https://github.com/RMI-PACTA/workflow.transition.monitor know how many funds should be expected?

I think the original intent behind "number of funds" was for the "reporting" part of it versus the "checking" part of it, e.g. potentially adding that to the manifest so it's super easy for someone to inspect what's in the data without having to load it.

I think @AlexAxthelm did something similar with workflow.factset recording data object stats in the manifest.

jdhoffa commented 5 months ago

Ah that's a cool idea and makes a lot of sense!

Ok then as you were!! XD

AlexAxthelm commented 5 months ago

I think @AlexAxthelm did something similar with workflow.factset recording data object stats in the manifest.

NB: this is also included in the new pacta.workflow.utils manifest export that I'm working on. https://github.com/RMI-PACTA/pacta.workflow.utils/blob/ba9b91444cd3a2fe35f3acb94b9458108a992353/R/get_file_metadata.R#L69-L75

AlexAxthelm commented 5 months ago

I don't like the idea of workflow.transition.monitor silently exporting bogus data, so I intend to implement some version of this there. Not opposed to similar checks living in the QA repo.

I think that it might be a bit simpler to maintain if we add the QA repo to Suggests, and call the checks with an if (require(workflow.pacta.data.qa)){run_data_prep_checks()}

jdhoffa commented 5 months ago

I don't like the idea of workflow.transition.monitor silently exporting bogus data, so I intend to implement some version of this there. Not opposed to similar checks living in the QA repo.

I think that it might be a bit simpler to maintain if we add the QA repo to Suggests, and call the checks with an if (require(workflow.pacta.data.qa)){run_data_prep_checks()}

Just want to bare in mind that we are talking about large hypotheticals there since there is no way to run https://github.com/RMI-PACTA/workflow.pacta.data.qa non-interactively at this stage (the mainoutputs are .pdfs)