flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.76k stars 656 forks source link

[Docs] Changing flyte local cache directory #5613

Open behzadk opened 3 months ago

behzadk commented 3 months ago

Description

Docs currently describe where cache is stored: ~/.flyte/local-cache/

I haven't been able to find how this default cache can be changed. Is this currently possible? Happy to update the documentation if someone can point me in the right direction!

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 3 months ago

Thank you for opening your first issue here! 🛠

runllm[bot] commented 3 months ago

Changing the default Flyte local cache directory is not explicitly documented in the provided sources. The default cache directory is ~/.flyte/local-cache/ as mentioned in the Flyte documentation.

Would you like more details on how to potentially configure this, or any other related information?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

davidmirror-ops commented 3 months ago

@behzadk This is not configurable at the moment. Could you share more about your use case and why you need to change cache location?

behzadk commented 3 months ago

Hi David, couple of reasons:

  1. Sometimes HPC use mandates writing any data to a scratch space
  2. Sensitive data may have restricted access to certain folders or workspaces. This can be mitigated by only caching references to sensitive data rather than actual information, but to be on the safe side it would be preferable to cache within the permitted spaces.
behzadk commented 2 months ago

@davidmirror-ops

Do you know if there's a workaround for this? I can't find where the flyte cache dir is set, was wondering if there was an environmental variable I could change, or perhaps symbolically link the default cache folder to another folder on scratch space

davidmirror-ops commented 2 months ago

@behzadk while I haven't tried, the simlink idea may work. Would you be open to try it and report any finding?

behzadk commented 2 months ago

Thanks @davidmirror-ops , the linked folder works so this is fine as a work around.

I think it would still be useful to specify a cache dir, and am also curious about where in the repo this current dir is actually defined. But happy for you to close this if we aren't proceeding any further.

davidmirror-ops commented 2 months ago

@behzadk default cache location is defined (aka hardcoded) here: https://github.com/flyteorg/flytekit/blob/bcdfca1dec867bef88cb0c311788a014b71d195c/flytekit%2Fcore%2Flocal_cache.py#L10-L12

I think making it configurable is straightforward. Is this an area where you'd be available to contribute?

Adityamalik123 commented 1 month ago

@davidmirror-ops Could this be assigned to me if no one else is currently working on it?

davidmirror-ops commented 1 month ago

@Adityamalik123 sure! Please let us know any questions you may have during the process