L0laapk3 / FactorioMaps

L0laapk3's FactorioMaps mod
https://mods.factorio.com/mod/L0laapk3_FactorioMaps
Other
118 stars 22 forks source link

Nondeterministic hang or race on posix systems because auto.py waits for wrong PID to exit #102

Open saulrh opened 2 years ago

saulrh commented 2 years ago

From #37:

Problem lies with the fact that on posix the pid of the main process isnt the original pid anymore when it was created, I don't know how to solve this for now. One solution could be to just kill all factorio instances but I want to avoid that.

If we can't kill by PID because the PID is inaccurate, the loop on line 367 that waits for pid to have exited may either complete before factorio has exited (if the PID is still unused), hang randomly (if the PID has been reallocated), or deadlock (if the PID has been reallocated to one of the worker subprocesses that auto.py spins up for handling images).

https://github.com/L0laapk3/FactorioMaps/blob/3479b2ac1285658d4dbfbc2e4b4a99b81d113544/auto.py#L367-L368

This likely also explains #89. I'm not entirely sure what a fix would look like; will think about it.

This problem is about half-theoretical. I'm currently working on trying to export an hourly timelapse of my Space Exploration victory, all 550 hours and 15 planets of it. I have the computational horsepower and disk space to actually pull it off but I have a feeling I'm going to run into all sorts of low-occurrence stability problems while I'm attempting this, and this is the first big one - I'm running into a nondeterministic hang at this step that might be explained by this bit of code working the way I think it does.

saulrh commented 2 years ago

Oh, you're actually in the forum topic about set_wait_for_screenshots_to_finish and it doesn't sound like you ever got an answer for your last question. Hrrrrrm. If you could force the game to process the screenshot queue all at once that'd be ideal, but otherwise ick.