bede / hostile

Precise host read removal
MIT License
74 stars 5 forks source link

What's the best way to override the index download directory? #32

Closed bdklahn closed 5 months ago

bdklahn commented 6 months ago

I'm working on integrating this into a service. If I want to cache the index files to a non-user-specific shared location (e.g.), would the best way to do that be to maybe override the XDG_DATA_DIR?

https://github.com/bede/hostile/blob/82b3a5db22880f5ba11254a15ea846f0f5287427/src/hostile/util.py#L32

I was tracing through the code, and it looks like this isn't exposed as a parameter, like the output dir is. But am I missing something?

Thanks!

bede commented 6 months ago

Hi Brian, Thanks for reporting this, which definitely should be documented.

If you set the environment variable XDG_DATA_HOME to point to your custom shared location, user_data_dir() should notice and use it instead of the OS-specific default path. A subdir named hostile will be created containing downloaded indexes. I've just verified this, and it seems to work on Linux, but not MacOS (?!), meaning I might need to report an issue to the platformdirs maintainer. But unless using MacOS, this should work. Please let me know how you get on.

The other option is to download the indexes and directly reference them using paths supplied to --index, which is more clunky.

bdklahn commented 6 months ago

Hi Brian, Thanks for reporting this, which definitely should be documented.

If you set the environment variable XDG_DATA_HOME to point to your custom shared location, user_data_dir() should notice and use it instead of the OS-specific default path. A subdir named hostile will be created containing downloaded indexes. I've just verified this, and it seems to work on Linux, but not MacOS (?!), meaning I might need to report an issue to the platformdirs maintainer. But unless using MacOS, this should work. Please let me know how you get on.

The other option is to download the indexes and directly reference them using paths supplied to --index, which is more clunky.

Hi Bede,

Thanks for the response. My first thought was to maybe set via os.environ (or export . . . before running). But then I thought, since I've been testing/trying via the Python API (since our tools wrapper is written in Python), maybe I can just import and set via the hostile.utils namespace(?). If possible, that ought to bypass any platformutils calls, right? It doesn't really matter for me, since our server is Linux (not OSX), but . . .

I can play with it in a conda env, and report back. Thanks.

bede commented 6 months ago

Ah, you're using the Python API, nice. In that case I would suggest setting the environment variable in Python:

import os
os.environ["XDG_DATA_HOME"] = "/data"

If for some reason this doesn't work, I can implement a direct override using e.g. a function argument.

bdklahn commented 6 months ago

Ah, you're using the Python API, nice. In that case I would suggest setting the environment variable in Python:

import os
os.environ["XDG_DATA_HOME"] = "/data"

If for some reason this doesn't work, I can implement a direct override using e.g. a function argument.

Heh. That was my first thought (and what I meant, by "override the XDG_DATA_DIR"), but I seemed to talk myself out of it, thinking that was unnecessarily going out into the calling environment, only to allow loading back in. But maybe that ends up being identical (as part of any memory the Python instance already has loaded). At least it should be simple. Thanks.

bede commented 6 months ago

I know where you're coming from! Environment variables are not especially pretty. If for some reason this doesn't work properly, let me know and we can sort it.

bede commented 6 months ago

In the next release, the Right Way to do with will be to use the new HOSTILE_CACHE_DIR environment variable which should work cross-platform. On Linux, XDG_DATA_HOME will still be respected so long as HOSTILE_CACHE_DIR is unset.

https://github.com/bede/hostile/commit/57be277e89204ef8e63ceeaa704d13a82da82dec

bdklahn commented 6 months ago

In the next release, the Right Way to do with will be to use the new HOSTILE_CACHE_DIR environment variable which should work cross-platform. On Linux, XDG_DATA_HOME will still be respected so long as HOSTILE_CACHE_DIR is unset.

57be277

Nice! Thanks!

bdklahn commented 6 months ago

BTW, I think os.environ is just a dictionary. So, I think, you should just be able to just use the second argument of get to set the default fallback, rather than using if. setdefault is another useful one.

bdklahn commented 6 months ago

It doesn't look like setting . . .

os.environ['XDG_DATA_HOME']=. . .

. . . works for the Python API, for me.

My script reports the correct (overridden) value of . . .

f"os.environ['XDG_DATA_HOME']=}"

. . . but . . . 15:18:25 INFO: Saved human index (~/.local/share/hostile/human-t2t-hla)

Anyway, just thought I'd follow-up and let you know that.

Thanks.

bede commented 6 months ago

Thank you for this and sorry for delay, I've been travelling this week. Which OS and Python version are you using? I'll look into it.

On Thu, 14 Mar 2024 at 20:22, Brian Klahn @.***> wrote:

It doesn't look like setting . . .

os.environ['XDG_DATA_HOME']=. . .

. . . works for the Python API, for me.

My script reports the correct (overridden) value of . . .

f"os.environ['XDG_DATA_HOME']=}"

. . . but . . . 15:18:25 INFO: Saved human index (~/.local/share/hostile/human-t2t-hla)

Anyway, just thought I'd follow-up and let you know that.

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/bede/hostile/issues/32#issuecomment-1998386004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHWAABIEHB5HWIDBREI5FLYYIBIZAVCNFSM6AAAAABEF6IJUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGM4DMMBQGQ . You are receiving this because you commented.Message ID: @.***>

bdklahn commented 6 months ago

Thank you for this and sorry for delay, I've been travelling this week. Which OS and Python version are you using? I'll look into it.

No worries. I've been traveling some, lately, as well.

GNU/Linux with Python 3.10 We're in a Singularity container, within a Conda env. Looks like that is CentOS 6.10 (on a CentOS 6 host, I believe). The Conda env appears to be running Python 3.10.13. Thanks.

bede commented 5 months ago

Hi Brian, Thank you for your patience with this, I'm home from a second trip now – an unusual month for me. I'm afraid that I am not able to replicate your issue with an Ubuntu 22 VM, and testing the latest unreleased commit mentioned earlier allows the cache location to be overriden either using the legacy now-understood-to-be-linux-only $XDG_DATA_HOME, or the replacement and cross-platform $HOSTILE_CACHE_DIR

If it's possible for you to follow the instructions in the readme to create a development build with the latest changes, I would be very interested to know if things are working for you. Nevertheless I will probably create a new release soon.

bede commented 5 months ago

This change has been released in 1.1.0

bdklahn commented 5 months ago

This change has been released in 1.1.0

I'll just try it out after our service container is updated to the new package. You can close this, if you want. I'll just open another issue, if necessary.

Also, I don't think its necessary/appropriate to transfer the minor version, when bumping the major.

https://semver.org

Also, I don't know if you knew a Python dict get method has a default parameter. No need to implement if else logic manually. https://docs.python.org/3.11/library/stdtypes.html#dict.get

Thanks for the package update!

bede commented 5 months ago

Also, I don't think its necessary/appropriate to transfer the minor version, when bumping the major.

https://semver.org

You are mistaken – I neither bumped the major version nor 'transferred' the minor version in this release.

Also, I don't know if you knew a Python dict get method has a default parameter. No need to implement if else logic manually. https://docs.python.org/3.11/library/stdtypes.html#dict.get

Yes, and Hostile relies on it in places, feel free to submit a PR if you can simplify it without failing tests.

Thanks for the package update!

A pleasure! Feel free to reopen etc. if the problem persists.

bdklahn commented 5 months ago

You are mistaken – I neither bumped the major version nor 'transferred' the minor version in this release.

I guess I must be. For some reason I had the version pinned to 0.1.0 in our environment.yml file (which I thought you had bumped to 1.1.0). Maybe typo on my part, or I had come across that on an old conda web page. Sorry. Thanks.

bede commented 5 months ago

No problem. Sometimes conda solvers install an ancient package version in trying to satisfy pinned dependency versions, so perhaps something along these lines happened. Do let me know if you spot this happening again.