InseeFrLab / pynsee

pynsee package contains tools to easily search and download french data from INSEE and IGN APIs
https://pynsee.readthedocs.io/en/latest/
MIT License
67 stars 8 forks source link

Multiprocessing sometimes freezes get_geodata #203

Open tgrandje opened 1 month ago

tgrandje commented 1 month ago

This is something I have already encountered on a linux machine. It seems to be linked to the multiprocessing freeze support not being properly handled, but the user might not even know of it without heavy debugging.

It occured to me again while debugging this issue and this is definitely not easy peasy...

As multiprocessing does not seem to be of any value here (compared to multithreading), I propose to switch to pebble ThreadPool which has a close enough API.

tfardet commented 1 month ago

Why go for pebble rather than multithreading?

tgrandje commented 1 month ago

To be honest? Mostly lazyness... 😊

And to be more thorough: I like pebble, it has a unified API for both multiprocessing and multithreading which makes switching from one to the other really simple. I've had a go on concurrent.futures in the past, and I don't have very good memories of it (it is not straightforward as you need to submit the function for each args, then mark the results as completed...).

Do you object to pebble? (Btw: I've never encountered a single trouble installing it.)

tgrandje commented 1 month ago

I just saw the two hanging PR. Are they not linked to that same problem? https://github.com/InseeFrLab/pynsee/pull/182 https://github.com/InseeFrLab/pynsee/pull/179

tfardet commented 1 month ago

Do you object to pebble

I don't know anything about pebble, I just wanted to know whether there was a strong reason to add new dependencies.

I just saw the two hanging PR. Are they not linked to that same problem?

Honestly I don't remember, but there was definitely an issue with that code so rewriting it is definitely a good idea!