InseeFrLab / pynsee

pynsee package contains tools to easily search and download french data from INSEE and IGN APIs
https://pynsee.readthedocs.io/en/latest/
MIT License
70 stars 10 forks source link

pynsee doesn't (always) clean tempfiles #205

Open tgrandje opened 3 months ago

tgrandje commented 3 months ago

When running the tests for my actual PR, i've remarked pynsee is not (always ?) cleaning temporary files after it's finished. As a matter of fact, this has just filled the available space I still got on my windows/WSL partition.

For instance, that tmpfile is never cleaned afterwards.

tfardet commented 3 months ago

@hadrilec I think your in detail knowledge of INSEE-specific data is required to understand how long this tmpfiles actually need to last, please see my comment on the associated PR #206

hadrilec commented 3 months ago
tfardet commented 3 months ago

Great, thanks!

functions using the save_df folder should delete the files

so for these, TemporaryFile, NamedTemporaryFile, or TemporaryDirectory should be used in a context manager

data from _get_full_list_wfs should be kept for one python session

so we need an atexit function (EDIT: no we don't, see below)

hadrilec commented 3 months ago

well, actually for _get_full_list_wfs it is the same as for the others, as it is only called by get_geodata_list which is using save_df

tfardet commented 3 months ago

great, no atexit, then!

tgrandje commented 3 months ago

I'll settle for removing the unnecessary tempfiles first :-) I've started it : it might (almost) be easier and (surely) cleaner than tracking the tempfiles which are opened in nested functions (and could therefore not really be handled through context managers).

tgrandje commented 3 months ago

I'm not sure about _create_insee_folder. As far as I can see, this is to prevent an exception when permissions are currently too low to create an appdata folder. In that case, I'd say the tempdir should stay, and the lru_cache on _get_temp_dir() prevents creating multiple tempdir during a session.

What about cleaning that?

tfardet commented 3 months ago

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I wonder whether everything could not be replaced by platformdirs.user_cache_dir("pynsee", ensure_exists=True)

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

tgrandje commented 3 months ago

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I sure hope so, but I'm no expert on that. If there wasn't that comment, I would already have gone for the platformdirs.user_cache_dir("pynsee", ensure_exists=True)...

@hadrilec does that reminds you of anything?

EDIT: If we move to rely strictly on platformdirs, we can also get rid of the _get_temp_dir.py file: that call is the last one (I got rid of all the other to favor in-memory files instead of tempfiles).

hadrilec commented 3 months ago

if platformdirs works as expected _get_temp_dir will not be used. As it is safe this way, I would prefer not to remove it because I dont want to debbug in case platformsdirs fails.

tgrandje commented 3 months ago

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

I suspect that's a typo. But if we don't want to break (too much) cache when upgrading pynsee, we should keep it that way.

hadrilec commented 3 months ago

this is not a typo, back then this folder was in different path location it was more tidy this way. I would prefer not to change this feature.