MorvanZhou / pytorch-A3C

Simple A3C implementation with pytorch + multiprocessing
https://mofanpy.com
MIT License
621 stars 142 forks source link

Process hangs at res_queue.get() in Linux #22

Open rons613 opened 2 years ago

rons613 commented 2 years ago

In discrete_A3C.py, the res_queue.get() in the main function hangs for a very long time (possibly forever) in Linux, but the entire code works perfectly fine on Windows.

workers = [Worker(gnet, opt, global_ep, global_ep_r, res_queue, i) for i in range(mp.cpu_count())]
[w.start() for w in workers]
res = []                    
    while True:
        print('Last printed checkpoint. Printed only during first iteration of while loop')
        r = res_queue.get()
        print('This line is never printed.')
        if r is not None:
            res.append(r)
        else:
            break
[w.join() for w in workers]

No errors are thrown, so presumably Pytorch is installed correctly and working. By inserting print() statements at various checkpoints in the code snippet above (and in the Worker class constructor function) reveals that the code never moves past the very first call to res_queue.get(). Is anyone else having this same problem on Linux?

sunshinesmilelk commented 2 years ago

I have the same problem on Windows

rons613 commented 2 years ago

I have the same problem on Windows

For me, there was no major problem in Windows 8.1 nor 10 Pro. The only issue I had was that when I plugged in my custom environment, an instance of it would be spawned two times and the workers would gradually come on one at a time. Are you running the exact Cartpole code on Windows and getting this problem, or is the code modified in some way?