alan-turing-institute / ARC-LoCoMoSeT

Low-Cost Model Selection for Transformers
MIT License
1 stars 0 forks source link

Setting default temporary directory for `tempfile` #93

Closed jack89roberts closed 11 months ago

jack89roberts commented 11 months ago

This is only relevant for caching preprocessed data with HuggingFace datasets when caching is disabled (which means cache to a temporary directory instead of the normal HuggingFace cache, rather than not caching at all) and keep_in_memory is set to False. Basically, this only matters where the dataset/model combo means the preprocessed training dataset is huge (e.g. >500 GB local scratch space in Baskerville jobs).

tempfile.gettempdir() is meant to check the value of some env vars (see here) before using a default OS-dependent location for temporary files if those aren't set. However, its value is cached and seems to be set early after launching python, which means setting the env variable with os.environ in a script doesn't change it. The possible workarounds are: