Closed mikesmit closed 2 weeks ago
From slack chat: Nikhil Woodruff Today at 4:43 AM Ah right, ok: so what’s happening here is the API uses our UK microdata for UK impacts. We’re not allowed to share outside the UK org, and the GH actions and live server use a token to download it. I recently changed the API to downloading it at startup time rather than whenever the first UK impact is received
Nikhil Woodruff Today at 4:44 AM We should probably add either a try except clause or env variable to not do this when testing non-UK parts
Looking to see if I can locate "I recently changed the API to downloading it at startup time rather than whenever the first UK impact is received"...
Looks like it's related to Split simulations into chunks #1938
Specifically this bit adding download_microdata.py
Chatted with @nikhilwoodruff this morning. TLDR: we want to load the UK data on demand, but without causing a race condition.
SO When you try to run the tests, then it immediately attempts to load UK data even though that is not necessary and fails because you can't access it without special permission.
Preferred solution: We're loading this into memory anyway. Instead of caching it on disk we could just load it directly into memory once per worker
Pros:
Download to a randomly-named tmp file and then use a move (if you move within the same disk partition this is atomic, not a copy) to the name you want.
Then either the file is there or not. If two processes do it at the same time, they will all download it, but one will "win".
Pros:
Looking a bit more closely... I'm not 100% confident that I know all the places file_path is used. I think I'll just solve this one problem of making the file download atomic (option 2) in this one case.
put in a pull request to core here: https://github.com/PolicyEngine/policyengine-core/pull/307 that will make the download/write atomic.
This should let us just back out the preloading of the microdata which will in turn mean we don't get errors for trying to download the UK microdata.
When running make debug-test on the policyengine-api package I get the following error: