Closed jdhoffa closed 7 months ago
@cjyetman not to play hot potato with this issue, but I think it is something that would be handled directly in https://github.com/RMI-PACTA/workflow.data.preparation? That seems like the place where "deciding if we include something in the pacta-data
output folder" would be relevant.
Shall we move this issue there? Or should we just close it?
I'm having a hard time reconstructing the context of this. At the moment, workflow.data.preparation/run_pacta_data_preparation.R doesn't do anything with indices.
There has long been a confusing circular dependency with data.prep and indices (and peer group results). One needs the output of data.prep to run the indices' portfolios through PACTA to get the indices' results which need to be combined with the data.prep outputs. I still don't have any idea how to break that circle. I think one conceptual step we took towards that was to recognize and use a naming scheme to differentiate "PACTA analysis inputs" (which do not need indices or peer results) and "PACTA TM inputs" (which do need both).
I thought that there was a workflow specifically for collecting and processing, e.g. running through PACTA, the indices, and I would guess this issues is actually most relevant there?
I guess the intention would be that https://github.com/RMI-PACTA/workflow.pacta and https://github.com/RMI-PACTA/workflow.pacta.report would deal with this.
Something like this:
workflow.pacta
gets built workflow.pata.report
calls the workflow.pacta
image: both to run the "input portfolio" and to run the "indices portfolios" (noting that "running the indices portfolios" is only something that's necessary to do when making a report)That way we remove the circular dependency, and ensure the same worfklow.pacta
image is used to run all relevant portfolio-like inputs.
cc: @AlexAxthelm
Migrated to https://github.com/RMI-PACTA/workflow.prepare.pacta.indices as per @cjyetman suggestion
workflow.pata.report
calls theworkflow.pacta
image: both to run the "input portfolio" and to run the "indices portfolios" (noting that "running the indices portfolios" is only something that's necessary to do when making a report)
I'm not 100% sure making workflow.pata.report
run the input portfolio and multiple index portfolios all the way "through PACTA" on-the-fly, e.g. every time a portfolio is input, is an ideal solution. At least not in the context of the Webapp where time efficiency is desired. @AlexAxthelm?
edit: @jdhoffa maybe you meant specifically what workflow.prepare.pacta.indices does, not the Webapp? But in that case there would not be an "input portfolio", right?
Time efficiency could be preserved by caching the results? (not something we do or have ever tried to do, but could be possible as a feature in workflow.pacta.report
?)
The problem we are trying to solve is ensure that the EXACT same build is used to run both "input portfolio" and "indices", which I think is an important problem to solve.
Including the results of PACTA-processed indices in a "TM/Webapp analysis inputs" data directory has been our form of caching in the past. The new Webapp may have some capability of "cache the indices results somewhere the first time a portfolio is input", but TM will almost certainly not ever have that capability.
Yup!
To clarify, all of my points above are towards a new workflow.* paradigm for the WebApp (and after)
Absolutely none of this should be considered before:
If we "move then improve" this discussion lands in the "improve" bucket.
(not something we do or have ever tried to do, but could be possible as a feature in
workflow.pacta.report
?)
🤷🏻 maybe, but we have to keep in mind that in its current form workflow.pacta.report
is intended to run as an ephemeral Docker instance, so wherever this cache would be, it would need to be somewhere outside of the Docker image, which implies the server running it has some resource available for that. An obvious candidate for that would be the data input storage, but I suspect/hope that would be a read-only resource.
I will create a GH label that identifies that. You may create similar tags to the TM important PROD repos you maintain, if you wish.
I will close this in favor of https://github.com/RMI-PACTA/workflow.pacta.report/issues/3
I don't see any point in working on this, on this repository in it's current state, while it serves the TM use-case.
The portfolios we download from iShares have never been included in the "PACTA_analysis inputs" folders before, and they are not used as inputs to PACTA_analysis process directly, so I'm not totally convinced that they should be included in the outputs of data.prep. What has been included in the outputs previously was the output of PACTA_analysis using those as the input portfolios. Maybe it's convenient to have the portfolios there too 🤷🏻? Not really sure. But clearly there is no purpose for having them included in the pacta-data repo, and conceptually (to me anyway) this repo is intended to build the folder for a given quarter that should be a drop-in to the pacta-data repo.
Thanks @cjyetman