Closed OvermindDL1 closed 5 years ago
The 'crop' step still takes a fraction of a second with getting over a thousand chunks in later save files now so it might be good to just leave it without multiprocess. There's something screwy there (I'm wondering if multithread processes are being left unclosed elsewhere, they need to be closed before they are joined on or they become zombie processes that later multithread usage can use and fail on since it re-uses existing processes for that, not looked deep enough into the code to see all the usages yet).
I've actually not had this problem myself, but I did have more or less the exact same problem where it can't exit on cropping when working on the code and it came across an exception, then ctrl+c would have the same behaviour. If it freezes at mp.Queue.get(true)
then it could be that the queue is somehow not being emptied fast enough or something and it gets in some sort of deadlock waiting for space to free up for the queue. I'll definitely look into it.
As for making it not multithreaded, the cropping step scales heavily with the amount of light sources/power poles in your world, it does actually take a considerable amount of time in my worlds, so I'm not really keen on removing the multithreading on it.
Ah, not a ton of powerpoles in this world yet (lot of factorissimo usage on this server as the other person playing on it is new to the mod so they are going a bit hogwild with, much more so than I usually do. ^.^;).
Feel free to see the maps we've genned so far:
First, when Ctrl+c was hit it tried running taskkil /im factorio.exe, which is a Windows command (the heck?! Python should not need those.)...
Small note on this: Its merely a fallback if steam is being used on windows, it stops me from managing the factorio process properly trough python, I could technically do it in python but it'd be exactly the same as the taskkill command, in other OS's I don't know if steam also does that, I hope it doesn't because its quite silly. If it gives an error trying to run that, its not a problem as its really a last resort thing (especially on ctrl+c) and shouldn't impact the flow of the program.
Ah you mean the forking. Yeah when it's the steam version then when you run the factorio appllication binary, 'that' then sends a command to steam to launch factorio and then it itself dies to let steam bring up the full copy. It does that on all OS's yes. The command to kill all factorio's on linux (and mac?) is killall factorio
(or killall -9 factorio
to kill it ungracefully, but factorio closes properly so this shouldn't be used).
Lol, I've been half tempted to rewrite the python part in bash instead, it can call all the necessary imagemagic things directly to do the work entirely parallel, would be easier to reason about. ^.^;
I've had another look at this, I really have no clue whats causing this and since its never happening on my machine, its nearly impossible to find the problem.
Does setting maxthreads = 1
somehow fix it for you, or do you really need to strip out all the multithreading logic and do all the work on the main thread before it actually works?
Does setting
maxthreads = 1
somehow fix it for you, or do you really need to strip out all the multithreading logic and do all the work on the main thread before it actually works?
That's actually how I fix it, force overriding maxthreads = 1
is what I've been doing to fix most of it but I don't know if that's just hiding the problem or not.
I've introduced some flags to control the amount of threads used, so at least you can get it to work with --cropthreads=1
. Unfortunately since I still have no clue whats going on thats causing it to not work for you, I can't do anything more for now, Feel free to reopen if you have any additional information :P
Is getting stuck like this caused by this issue?
@slikts not sure. Are you positive you waited long enough? It can take quite some time to round up all the images. If this is the case, can you try running with --verbose, and after waiting long enough, doing Ctrl C to stop the program, and post the entire log?
Yeah I've not seen it stop there before so I don't "think" it's related?
Thank you! I noticed the same issue when I used a debugger to find out why my script was hanging over night.
--cropthreads=1
solved the issue.
workthread.join()
is still freezing on occasion, unsure what the workthread is doing or why it's freezing as nothing is being logged, it is thewaitKill
workthread, and factorio did already get killed and it did fully die (no remnants of its process is still running).I'm not entirely sure of the purpose of that workthread as it is spawned up and then immediately joined on.
When I canceled it after leaving it at that point for 20 minutes it printed this when I Ctrl+c'd it:
Interestingly I only hit Ctrl+c once, I'm guessing it's printing
KeyboardInterrupt
for each subprocess. At this point, however, it was frozen again with no further output, so after waiting a while I hit Ctrl+c again and it printed this and finally died:A few interesting things to notice:
First, when Ctrl+c was hit it tried running
taskkil /im factorio.exe
, which is a Windows command (the heck?! Python should not need those.)...It seems the subprocesses were all stuck waiting on messages, the only one different was:
For note
mp.cpu_count()
is returning 16 (the number of cores in this system).Some testing shows that in
crop.py
this section:I put a
print(repr(files))
just after thefor _ in range(len(files)):
line and it was only called once, printing out:Upon which it seems to permanently freeze at
progressQueue.get(True)
.It only seems to be happening on 'night' as well, these is the log up to that point:
At which point it freezes. Interesting that the file list doesn't output for the day crop...
Replacing the
work
function to justreturn False
does not change anything, setting output shows that it looks like it's not even being called... interesting...Tried calling
map_async
and evenmap
in a variety of different ways and no callback executes, everything is getting locked on a semaphore and never being released.Just to get it 'working' for now, commenting out all the multiprocess stuff in crop.py (haven't run into any issues with it in zoom.py yet for whatever reason) and just calling
work
directly in a loop onfiles
and that works, only takes a second as well with about a hundred chunks (zoom is the one that takes a few seconds) and works fine.