Open cjyetman opened 4 months ago
Reopening because #87 reveals this is still not adequately fixed. If you have R_CONFIG_ACTIVE=PA2024CH
in your .env
, that will activate the PA2024CH
config, but it will also set the path that the PACTA data is mounted to as /pacta-data/PA2024CH/
, based on target: /pacta-data/${R_CONFIG_ACTIVE}
in the docker-compose.yml
Hmm not ideal. I will open a hotfix to get it working again, and open an issue to track progress on designing a more appropriate solution.
Ah so it turns out that this issue is already the "follow-up issue" i was going to open haha. Totally agree with the strategy proposed here.
(I think @AlexAxthelm closed this issue before I had a chance to read it XD)
Hmm. I remember thinking when I saw that R_CONFIG_ACTIVE
in a path that it would probably give us grief at some point. wasn't expecting it within a week.
😂 for real I don't think such a big deal to solve though tbh
I think the logic is:
target
path inside the image needs to be parameterized in the docker-compose.yml to whatever is the relevant quarter.env
is necessary, e.g. PACTA_DATA_TIMESTAMP
, so that it can be distinguished from R_CONFIG_ACTIVE
(which could = e.g. 2023Q4
, but could also = e.g. PA2024CH
)The issue there is that it conflicts with the assumption that config.yml
should set that information somehow
(e.g. it would be possible in that situation that PACTA_DATA_TIMESTAMP
point to 2023Q4
, R_CONFIG_ACTIVE
points to PA2024CH
BUT in config.yml
that PA2024CH
points to 2024Q4
or something like that)
seems like an unnecessary maintenance burden, so I am going to think it through more cautiously.
yeah, I guess that's what I meant... the PACTA data timestamp is necessary in the docker-compose step, and since that takes input from env vars not the config, the PACTA data timestamp kinda needs to be defined in the env vars not the config
I think one of the tricky parts is that workflow.transition.monitor
expects a directory called 2023Q4
but pacta-data directories are now named e.g. 2023Q4_20240229T200630Z
So already, I need to mount in the contents of a directory into a newly named directory.
This is why I can't just mount 2023Q4_20240229T200630Z
into pacta-data
which I think would be the most preferable approach.
Thoughts?
I think one of the tricky parts is that
workflow.transition.monitor
expects a directory called2023Q4
but pacta-data directories are now named e.g.2023Q4_20240229T200630Z
yes, agree 💯, that is the crux of it
Though the additional thing that plays a role is, you may need the mounted directory to look like pacta-data/2022Q4
or pacta-data/2021Q4
depending on what quarter you run. That's a holdover from the olden days when we allowed a TM Docker image to be able to run multiple timestamps/quarters. Generally speaking, we're not doing that anymore, but the capability is still there. I think in the future, e.g. with workflow.pacta, that might not be an issue anymore, because I designed it to have the data outside the Docker image, so it can be mounted where/however you want and the input config determines where the code should look for it, e.g there's no strong restriction like pacta-data/2023Q4
, it could be /2023Q4_20240229T200630Z
as long as the config has "data_dir": "/2023Q4_20240229T200630Z"
Yup totally.
I think all of this points to prioritizing superseding it with workflow.pacta
(i know you both agree, just saying it aloud).
I am tempted to not work much on this issue until after that is done/ after CH COP is delivered tbh (I don't want to jump through a bunch of hoops with an odd architecture if we have the opportunity to just update the architecture instead)
The current
docker-compose.yml
expects that you point to a localpacta-data/
directory which should have a sub-directory named for the desired timestamp/quarter, e.g.pacta-data/2021Q4
. Given our current strategy of building PACTA data directories, I think it might be more natural to point directly to that sub-directory, since one might have a directory with numerous sub-directories each containing different builds/versions of PACTA data, but you really only want/need one, and it could be confusing which one is being used.So instead of this in the
docker-compose.yml
https://github.com/RMI-PACTA/workflow.prepare.pacta.indices/blob/d93b71c58659f4ebdd894bf358fb3c5b110d01dc/docker-compose.yml#L8-L9and this in the
.env
we might have this is the
docker-compose.yml
and this in the
.env
The only problem is that the
/pacta-data/2022Q4
in thedocker-compose.yml
would need to adapt to whatever timestamp/quarter you're targeting, and I'm not sure how best to resolve that.AB#10133