Open AlexAxthelm opened 2 weeks ago
@AlexAxthelm there are benchmark inputs that are scraped on the fly using the pacta.data.scraping
package, e.g. https://github.com/RMI-PACTA/pacta.data.scraping/blob/main/R/get_ishares_index_data.R
I guess the hash would need to depend also on the result of that scraping to be complete?
Conceptually, makes sense to me, though...
Scratch all that... I forgot how this repo is being used. I'll have to think about that once my memory has improved.
- when / how often would the situation happen that the same TM Docker image is being used with the same data and the same benchmark portfolios? possibly on a re-run of actions on a PR when no changes have been made?
This happens a lot. we push a lot of changes to workflow.transition.monitor that aren't changing any of the processing code, or don't require a rebuild of the docker image (or rather, the entire image can be rebuilt from cache)
Runnning index prep is a long process that's part of the current build process for
workflow.transition.monitor
, and will probably be forworkflow.pacta.webapp
as well.Given that the indices don't actually change that much, it would make sense to do a check if the process needs to actually run, or if we can bypass it, and return the previous results instead.
My general thinking is to construct a hash based on:
pacta-data
,benchmark_inputs
filesand use that as a versioning key, and then we can check if the appropriate files exist in the blob store/AFS
so it would look something like:
cc @jdhoffa @cjyetman, Do I have the hashkeys right, or am I missing something?