consider whether `ishares_indices_*.rds` files should be included in output

jdhoffa commented 7 months ago

The portfolios we download from iShares have never been included in the "PACTA_analysis inputs" folders before, and they are not used as inputs to PACTA_analysis process directly, so I'm not totally convinced that they should be included in the outputs of data.prep. What has been included in the outputs previously was the output of PACTA_analysis using those as the input portfolios. Maybe it's convenient to have the portfolios there too 🤷🏻? Not really sure. But clearly there is no purpose for having them included in the pacta-data repo, and conceptually (to me anyway) this repo is intended to build the folder for a given quarter that should be a drop-in to the pacta-data repo.

Thanks @cjyetman

jdhoffa commented 7 months ago

@cjyetman not to play hot potato with this issue, but I think it is something that would be handled directly in https://github.com/RMI-PACTA/workflow.data.preparation? That seems like the place where "deciding if we include something in the pacta-data output folder" would be relevant.

Shall we move this issue there? Or should we just close it?

cjyetman commented 7 months ago

I'm having a hard time reconstructing the context of this. At the moment, workflow.data.preparation/run_pacta_data_preparation.R doesn't do anything with indices.

There has long been a confusing circular dependency with data.prep and indices (and peer group results). One needs the output of data.prep to run the indices' portfolios through PACTA to get the indices' results which need to be combined with the data.prep outputs. I still don't have any idea how to break that circle. I think one conceptual step we took towards that was to recognize and use a naming scheme to differentiate "PACTA analysis inputs" (which do not need indices or peer results) and "PACTA TM inputs" (which do need both).

I thought that there was a workflow specifically for collecting and processing, e.g. running through PACTA, the indices, and I would guess this issues is actually most relevant there?

jdhoffa commented 7 months ago

I guess the intention would be that https://github.com/RMI-PACTA/workflow.pacta and https://github.com/RMI-PACTA/workflow.pacta.report would deal with this.

Something like this:

Data prep gets run and timestamped/ versioned/ whatever
Indices get scraped and "prepared" as PACTA input
workflow.pacta gets built
workflow.pata.report calls the workflow.pacta image: both to run the "input portfolio" and to run the "indices portfolios" (noting that "running the indices portfolios" is only something that's necessary to do when making a report)

That way we remove the circular dependency, and ensure the same worfklow.pacta image is used to run all relevant portfolio-like inputs.

cc: @AlexAxthelm

jdhoffa commented 7 months ago

Migrated to https://github.com/RMI-PACTA/workflow.prepare.pacta.indices as per @cjyetman suggestion

cjyetman commented 7 months ago

workflow.pata.report calls the workflow.pacta image: both to run the "input portfolio" and to run the "indices portfolios" (noting that "running the indices portfolios" is only something that's necessary to do when making a report)

I'm not 100% sure making workflow.pata.report run the input portfolio and multiple index portfolios all the way "through PACTA" on-the-fly, e.g. every time a portfolio is input, is an ideal solution. At least not in the context of the Webapp where time efficiency is desired. @AlexAxthelm?

edit: @jdhoffa maybe you meant specifically what workflow.prepare.pacta.indices does, not the Webapp? But in that case there would not be an "input portfolio", right?

jdhoffa commented 7 months ago

Time efficiency could be preserved by caching the results? (not something we do or have ever tried to do, but could be possible as a feature in workflow.pacta.report?)

The problem we are trying to solve is ensure that the EXACT same build is used to run both "input portfolio" and "indices", which I think is an important problem to solve.

cjyetman commented 7 months ago

Including the results of PACTA-processed indices in a "TM/Webapp analysis inputs" data directory has been our form of caching in the past. The new Webapp may have some capability of "cache the indices results somewhere the first time a portfolio is input", but TM will almost certainly not ever have that capability.

jdhoffa commented 7 months ago

Yup!

To clarify, all of my points above are towards a new workflow.* paradigm for the WebApp (and after)

Absolutely none of this should be considered before:

PACTA COP CH 2024 is delivered successfully
WebApp is proven to function with existing functionality

If we "move then improve" this discussion lands in the "improve" bucket.

cjyetman commented 7 months ago

(not something we do or have ever tried to do, but could be possible as a feature in workflow.pacta.report?)

🤷🏻 maybe, but we have to keep in mind that in its current form workflow.pacta.report is intended to run as an ephemeral Docker instance, so wherever this cache would be, it would need to be somewhere outside of the Docker image, which implies the server running it has some resource available for that. An obvious candidate for that would be the data input storage, but I suspect/hope that would be a read-only resource.

jdhoffa commented 7 months ago

I will create a GH label that identifies that. You may create similar tags to the TM important PROD repos you maintain, if you wish.

jdhoffa commented 7 months ago

I will close this in favor of https://github.com/RMI-PACTA/workflow.pacta.report/issues/3

I don't see any point in working on this, on this repository in it's current state, while it serves the TM use-case.

RMI-PACTA / workflow.prepare.pacta.indices

consider whether `ishares_indices_*.rds` files should be included in output #46