Open Akz47 opened 6 years ago
Ok i must admit it has nothing to do with anaconda or the python version. For some reason i do not get the window for the crash anymore, but i can clearly see the same fault offsets inside the windows event-history. (maybe i disabled the window once with some setting, i'm not shure) Will try to investigate it. All i know for now is that the cpp_household-library seems to execute the _ZN14QOpenGLContext14currentContextEv-function (wich itself calls _ZNK14QOpenGLContext14extraFunctionsEv) wich seems to access an invalid offset. My guess is a nullpointer because of free'd resources or something like that.
Edit: Oh well i'm quite shure it crashes on locations like here: https://github.com/TheCrazyT/roboschool/blob/master/roboschool/cpp-household/render-simple.cpp#L340
There are probably more than one line that uses QOpenGLContext::currentContext()->extraFunctions(), guess i will just add a check for a nullpointer on those locations.
Thank you for promptly locating the code segments that caused the null pointer errors. What are the ExtraFunctions for, so they are not needed for rendering the simulations?
Meanwhile, I've entirely disable Windows Error Reporting and the crash dialog.
Below are the steps, in case anyone wishes to do so too:
So I will be oblivious to any crashes. :)
Please let me know if and when an updated version is available.
What are the ExtraFunctions for, so they are not needed for rendering the simulations?
They are needed, but by the time some destructors are called, the QOpenGLContext is already gone. Thats a problem because QOpenGLContext::currentContext() would return no object anymore.
I think I fixed it, you can download it here: https://dl.bintray.com/thecrazyt/roboschool/0.5/ (basically you only need to download cpp_household.pyd and replace it)
Thank you for your fantastic speedy fix, it works perfect now without any crash!
I've re-enabled Windows Error Reporting and double checked in the Event Log to confirm that no application errors are reported.
On a separate note, I'm having a little trouble executing the multiplayer samples demo_pong.py and demo_race1.py, which make use of os.mkfifo not available on Windows:
Traceback (most recent call last): File "demo_pong.py", line 23, in
gameserver = roboschool.multiplayer.SharedMemoryServer(game, "pongdemo", want_test_window=True) File "roboschool\multiplayer.py", line 257, in init player_n=n) File "roboschool\multiplayer.py", line 147, in init os.mkfifo(self.sh_pipe_actready_filename) AttributeError: module 'os' has no attribute 'mkfifo'
I tried searching around for a Windows version of this multiplayer.py but couldn't seem to find any. I saw some general (non-Roboschool) recommendations to replace os.mkfifo with os.pipe / pywin32 / ctypes, but I'm not too sure how exactly to go about doing that.
Line 147 in multiplayer.py is:
os.mkfifo(self.sh_pipe_actready_filename)
Could you please shed some light on how I can fix this for Windows execution?
Thank you once again for your kind assistance.
I tried os.pipe and failed ... After that I tried win32pipe to use named pipes ... and failed again ... Guess i can't fix it that easily, would take more time i guess.
No worries, thank you very much for trying. I'll experiment with the other examples that do not require the pipes first.
Alright it seems to work now with the commit: https://github.com/TheCrazyT/roboschool/commit/ecf52791f022443a110938931b02f04a5a17a824 Although i'm trying to make the project work for both operating systems, i probably break the possibility to use the same project on linux. I do not have enough time to test on both machines and since a new roboschool-version is planned it is not worth the effort to make it work on both systems i guess.
Well my solution is working, but probably can be improved. Not shure wich demo's are still failing, tested my solution with the demo_pong.py .
Edit: Not shure if it is an error but the animation seems to stop at frame 999. Currently can't figure out if this happens intensionally or by a bug. Strange thing is that i can't find any number in the source that limits to that frame. All i know is that you get an error about a closed pipe ... wich is weird because the source currently never closes the pipe (except if the "server"/python stops)
Thank you for updating the Roboschool to address the pipe issue.
I've updated my Roboschool copy with your 3 new files: winfifo.py, multiplayer.py and demo_pong.py, and also configured the temp directory paths.
When I run demo_pong.py, it exits with the following error without showing any animations:
Waiting tmp/multiplayer_pongdemo_player00
Waiting tmp/multiplayer_pongdemo_player01
Player 0 connected, wants to operate RoboschoolPong-v1 in this scene
Player 1 connected, wants to operate RoboschoolPong-v1 in this scene
Traceback (most recent call last):
File "demo_pong.py", line 26, in <module>
gameserver.serve_forever()
File "roboschool\multiplayer.py", line 285, in serve_forever
p.read_and_apply_action()
File "roboschool\multiplayer.py", line 192, in read_and_apply_action
check = self.sh_pipe_actready.readline()[:-1]
File "roboschool\winfifo.py", line 62, in readline
res = str(super().readline(),"UTF-8")
File "roboschool\winfifo.py", line 54, in read
result,data = win32file.ReadFile(self.handle,size,self.overlapped)
pywintypes.error: (109, 'ReadFile', 'The pipe has been ended.')
Is this related to your same error?
Yes it is the same error that i can't figure out. But for some reason that error does not happen at the first 999 frames so i'm seeing indeed a pong animation. First i thought it could be the internal garbage-collector of python, but since the pipe-variables are in global namespace("PIPE_HANDLES") this should not be the case.
Minimizing the window also stops the animation for some reason (can't remember what error happens if you do it).
I found this post about os.path.exists() interfering with the pipes and generating the same error.
Could this issue be related?
No, i figured out why it failed for me:
register(
id='RoboschoolPong-v1',
entry_point='roboschool:RoboschoolPong',
max_episode_steps=1000,
tags={ "pg_complexity": 20*1000000 },
)
That code is inside the init.py of the roboschool folder. max_episode_steps=1000 explains why it stops at 999 for me with that error. The client-python script finishes its execution after it finished its episode. Result is that the pipe is lost (wich is ok because the suprocess stopped) and the server throwing that pipe is broken error. This could normaly be silently ignored although i still don't get why the serve_forever function is written in a way to do more than 1 episode although the play-function in demo_pong.py finishes after 1 episode.
Edit:
Now that i think about it i guess they just forgot a while True:
above the call of the play-function.
Oh and i almost forgot:
also configured the temp directory paths.
this is not necessary because the paths are only virtual, i'm using named pipes and no real file on windows.
Thanks for the update. I updated the demo_pong.py with the "while True" line, and changed init.py's steps to 5000.
Below is what I got:
Waiting tmp/multiplayer_pongdemo_player00
Waiting tmp/multiplayer_pongdemo_player01
Player 0 connected, wants to operate RoboschoolPong-v1 in this scene
Player 1 connected, wants to operate RoboschoolPong-v1 in this scene
Traceback (most recent call last):
File "demo_pong.py", line 26, in <module>
gameserver.serve_forever()
File "roboschool\multiplayer.py", line 285, in serve_forever
p.read_and_apply_action()
File "roboschool\multiplayer.py", line 192, in read_and_apply_action
check = self.sh_pipe_actready.readline()[:-1]
File "roboschool\winfifo.py", line 62, in readline
res = str(super().readline(),"UTF-8")
File "roboschool\winfifo.py", line 57, in read
raise Exception("ret_code: %d" % ret_code);
Exception: ret_code: 258
The script showed the error, then returned to command line, but continued to output results like this:
40:-38 50:-46 52:-50 67:-62 53:-51 48:-44 58:-56 46:-43 58:-54 51:-47 ...
It seemed to continue indefinitely at about 1 result every 2-3 seconds for hours. Are these the expected results? However, no visual output / window is displayed.
I also tried enabling "video=True" in demo_pong, but the script will then crash with the earlier "pywintypes.error: (109, 'ReadFile', 'The pipe has been ended.')" error.
p/s: The temp directory I was referring to earlier was actually configured in multiplayer.py, which generates actual files in the system.
Alright i know what you mean the paths were no trouble for me (maybe because i have msys and cygwin installed?). Well i see that you mean the multiplayer_pongdemoplayer00 and multiplayer_pongdemoplayer01 files. Sadly the MULTIPLAYER_FILES_DIR was used for the pipe-paths as well and the ":" creates trouble. (because "\.\pipe\roboschoolC:\tmp" is no valid pipe path for example) I modified the winfifo to replace that character ...
Thank you for your reply. Actually I directly edited the MULTIPLAYER_FILES_DIR variable too, setting it just to "tmp", a relative path within my execution directory (which is agent_zoo). I see the "multiplayer_pongdemoplayer00*" files inside, so it seems to write correctly.
However, there is no video screen generated when I run this demo_pong. For others like RoboschoolWalker etc, an animation window is displayed.
I only keep seeing the output results like "0:-38 50:-46 52:-50 67:-62 53:-51 48:-44 58:-56 46:-43 58:-54 51:-47 ..." that seems to run indefinitely.
Is there something preventing the animation window from launching or rendering?
Do you use the current version?(winfifo.py should have the line fileName = fileName.replace(":","_")
)
Do you get any stacktrace?
The numbers that are outputted are normaly the scores of the left and the right "pong".
What is strange is that it shows big or negative values for you for some reason.
Currently i have no clue why the window is not shown, its hard to debug withouth having the same problem.
Maybe you could change the FIFO_DEBUG-constant to "true" (inside roboschool/winfifo.py) , post the result of the application on http://pastebin.com/ and link it here.
This could help me find the problem.
Thanks for your pointers. Yes, I'm already using the latest winfifo.py.
Once I run it, I get the following error, but the output numbers continue to be generated in the background:
Traceback (most recent call last):
File "demo_pong.py", line 27, in <module>
gameserver.serve_forever()
File "roboschool\roboschool\multiplayer.py", line 286, in serve_forever
p.read_and_apply_action()
File "roboschool\roboschool\multiplayer.py", line 193, in read_and_apply_action
check = self.sh_pipe_actready.readline()[:-1]
File "roboschool\roboschool\winfifo.py", line 62, in readline
res = str(super().readline(),"UTF-8")
File "roboschool\roboschool\winfifo.py", line 57, in read
raise Exception("ret_code: %d" % ret_code);
Exception: ret_code: 258
What does this return code 258 mean?
Below is the debug information after enabling FIFO_DEBUG: https://pastebin.com/mQZgXwyB
Could the animation problem be a separate issue unrelated to the winfifo, or is the rendering disabled somewhere? If I run RoboschoolPong_v0_2017may1.py, the animation shows properly.
The code 258 means that its a timeout that happens. I setted the time to 10 seconds wich should be more than enough for the subprocesses to respond. Atleast at the beginning the communication seems to work between the subprocesses and the main process (the one that calls gameserver.serve_forever() ).
But for some reason the subprocesses do not seem to write a second time after sending their model information ("RoboschoolPong-v1"). To explain the log a little, the first number represents the following: 12316 is the processid of the main process. 11296 is the processid of the one of the "pongs". 10248 is the processid of the other "pong". It crashes when the main process waits for response from one of the sub-processes. (for some reason one of the subprocesses only write to "multiplayer_pongdemo_player00_actready" for one time)
Thank you for your detailed analysis.
Since there is a timeout and crash, it seems weird that the games are still being played? The results like "114:-110 112:-107 131:-123 ......" still continue to be generated even after the return code 258 is displayed.
Does that mean that only a specific sub-process timed out / crash, without affecting the main loop?
Does any of these error or log messages help diagnose the missing animation rendering?
When I exit the simulations or press Ctrl-C at the command prompt, Windows always displays a crash alert and tries to initiate a Windows Error Reporting.
I checked the windows error log, and it appears to be associated with QT5 module:
I am using the following versions:
I don't think it affects the operation of the simulation, but it's weird that it crashes each time.
Thanks!