InseeFrLab / pynsee

pynsee package contains tools to easily search and download french data from INSEE and IGN APIs
https://pynsee.readthedocs.io/en/latest/
MIT License
67 stars 8 forks source link

pynsee doesn't (always) clean tempfiles #205

Open tgrandje opened 1 month ago

tgrandje commented 1 month ago

When running the tests for my actual PR, i've remarked pynsee is not (always ?) cleaning temporary files after it's finished. As a matter of fact, this has just filled the available space I still got on my windows/WSL partition.

For instance, that tmpfile is never cleaned afterwards.

tfardet commented 1 month ago

@hadrilec I think your in detail knowledge of INSEE-specific data is required to understand how long this tmpfiles actually need to last, please see my comment on the associated PR #206

hadrilec commented 1 month ago
tfardet commented 1 month ago

Great, thanks!

functions using the save_df folder should delete the files

so for these, TemporaryFile, NamedTemporaryFile, or TemporaryDirectory should be used in a context manager

data from _get_full_list_wfs should be kept for one python session

so we need an atexit function (EDIT: no we don't, see below)

hadrilec commented 1 month ago

well, actually for _get_full_list_wfs it is the same as for the others, as it is only called by get_geodata_list which is using save_df

tfardet commented 1 month ago

great, no atexit, then!

tgrandje commented 1 month ago

I'll settle for removing the unnecessary tempfiles first :-) I've started it : it might (almost) be easier and (surely) cleaner than tracking the tempfiles which are opened in nested functions (and could therefore not really be handled through context managers).

tgrandje commented 1 month ago

I'm not sure about _create_insee_folder. As far as I can see, this is to prevent an exception when permissions are currently too low to create an appdata folder. In that case, I'd say the tempdir should stay, and the lru_cache on _get_temp_dir() prevents creating multiple tempdir during a session.

What about cleaning that?

tfardet commented 1 month ago

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I wonder whether everything could not be replaced by platformdirs.user_cache_dir("pynsee", ensure_exists=True)

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

tgrandje commented 1 month ago

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I sure hope so, but I'm no expert on that. If there wasn't that comment, I would already have gone for the platformdirs.user_cache_dir("pynsee", ensure_exists=True)...

@hadrilec does that reminds you of anything?

EDIT: If we move to rely strictly on platformdirs, we can also get rid of the _get_temp_dir.py file: that call is the last one (I got rid of all the other to favor in-memory files instead of tempfiles).

hadrilec commented 1 month ago

if platformdirs works as expected _get_temp_dir will not be used. As it is safe this way, I would prefer not to remove it because I dont want to debbug in case platformsdirs fails.

tgrandje commented 1 month ago

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

I suspect that's a typo. But if we don't want to break (too much) cache when upgrading pynsee, we should keep it that way.

hadrilec commented 1 month ago

this is not a typo, back then this folder was in different path location it was more tidy this way. I would prefer not to change this feature.