Open pengsun opened 7 years ago
How often does it happen? Cause I cannot replicate it. Also it seems that there is a memory leak when using getState() inside torch.Threads. Can you confirm that or I'm doing something wrong here?
It always happens when I trained long enough. How can we tell there is memory leak from the information I post?
I see that you use your own wad file. Have you checked other scenarios/wads? I've created a test based on multiple_instances.lua and observed the memory leak.
Okay, will check more wad files... I checked cig.wad, which also crashed. My own wad file "ttt2.wad" is just renamed from "multi_duel.wad" with both players changed to "deathmatch player" by slade...
What about some singleplayer wads? Based on location of the crash, I suspect that bots may be the cause.
As ZDoom wiki says (https://zdoom.org/wiki/Bots) bots are officially unsupported in current versions of engine and unfortunately aren't working perfectly with all ViZDoom features (recording functionality). On the other hand, I believe that bots were used frequently during last year competition and no one reported similar problem.
I'll examine this for sure, but I'm a little busy lately, so it may take a week. Fortunately you can simply catch that exception and restart ViZDoom (sorry if it's obvious).
Similar issues occurred
*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x4ecee8230
Generating vizdoom-crash.log and killing process 25541, please wait... 40 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display:
I used basic.wad
provided from the scenarios
directory
Hello @GoingMyWay, can you provide some simple code to replicate this problem and vizdoom-crash.log file?
@mwydmuch The following are the details of the outputs
*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x4ecee8230
Generating vizdoom-crash.log and killing process 25541, please wait... 40 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display:
*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)
Generating vizdoom-crash.log and killing process 25553, please wait... 40 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display:
*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)
Generating vizdoom-crash.log and killing process 25535, please wait... 40 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display:
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "main.py", line 56, in <lambda>
agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
File "/home/doom/Code/A3C/agent.py", line 138, in train
r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "main.py", line 56, in <lambda>
agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
File "/home/doom/Code/A3C/agent.py", line 138, in train
r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "main.py", line 56, in <lambda>
agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
File "/home/doom/Code/A3C/agent.py", line 138, in train
r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.
Starting worker 0
Starting worker 1
Starting worker 2
Starting worker 3
Episode count 250, saved Model
Episode count 500, saved Model
Episode count 750, saved Model
Episode count 1000, saved Model
Episode count 1250, saved Model
Episode count 1500, saved Model
Episode count 1750, saved Model
Episode count 2000, saved Model
Episode count 2250, saved Model
Episode count 2500, saved Model
Episode count 2750, saved Model
Episode count 3000, saved Model
Episode count 3250, saved Model
Episode count 3500, saved Model
Episode count 3750, saved Model
Episode count 4000, saved Model
Episode count 4250, saved Model
Stop training name:worker_2
Since There are 4 agents training simultaneously, the maximum number of episode is 120000, however it stoped at 4250.
vizdoom-crash.log
cat vizdoom-crash.log
*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)
System: Linux ubuntu 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
ViZDoom version 1.1.1 (ZDOOM 2.8.1) ()
Compiler version: 4.8.4
Command line: /usr/local/lib/python3.5/site-packages/vizdoom/vizdoom -iwad /usr/local/lib/python3.5/site-packages/vizdoom/freedoom2.wad -config _vizdoom.ini -skill 3 -width 160 -height 120 +vid_aspect 3 +fullscreen 0 +viz_controlled 1 +viz_instance_id buZO9fDhLb +use_mouse 0 -rngseed 3260863032 +viz_labels 1 +viz_render_mode 1604 +viz_noconsole 1 +viz_screen_format 8 +viz_window_hidden 1 +viz_noxserver 1 -noidle -nojoy -nosound +viz_nosound 1 -file ./scenarios/basic.wad
Wad 0: vizdoom.pk3
Wad 1: freedoom2.wad
Wad 2: basic.wad
Not in a level.
Executing: gdb --quiet --batch --command=gdb-respfile-TiqBru
[New LWP 25536]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
* Loaded Libraries
From To Syms Read Shared Object Library
0x00007f9d77271e40 0x00007f9d774bbaae Yes (*) /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
0x00007f9d76fc22a0 0x00007f9d76ff00c6 Yes (*) /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
0x00007f9d76cc98c0 0x00007f9d76d3cb2a Yes (*) /lib/x86_64-linux-gnu/libglib-2.0.so.0
0x00007f9d769c2280 0x00007f9d76a707da Yes (*) /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
0x00007f9d767919f0 0x00007f9d7679e471 Yes /lib/x86_64-linux-gnu/libpthread.so.0
0x00007f9d76586350 0x00007f9d7658933c Yes /lib/x86_64-linux-gnu/librt.so.1
0x00007f9d7636ce00 0x00007f9d7637cbf8 Yes (*) /lib/x86_64-linux-gnu/libz.so.1
0x00007f9d76119d90 0x00007f9d76150520 Yes (*) /usr/lib/x86_64-linux-gnu/libjpeg.so.8
0x00007f9d75f073c0 0x00007f9d75f1308f Yes (*) /lib/x86_64-linux-gnu/libbz2.so.1.0
0x00007f9d75cfaf40 0x00007f9d75d00883 Yes (*) /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
0x00007f9d75aed250 0x00007f9d75aeddc3 Yes (*) /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0
0x00007f9d7588a0c0 0x00007f9d758cae7c Yes (*) /usr/lib/x86_64-linux-gnu/libsndfile.so.1
0x00007f9d75680ed0 0x00007f9d756819ce Yes /lib/x86_64-linux-gnu/libdl.so.2
0x00007f9d753d7620 0x00007f9d7543a803 Yes (*) /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007f9d7507b610 0x00007f9d750ea056 Yes /lib/x86_64-linux-gnu/libm.so.6
0x00007f9d74e62ab0 0x00007f9d74e72985 Yes (*) /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007f9d74aba520 0x00007f9d74bff183 Yes /lib/x86_64-linux-gnu/libc.so.6
0x00007f9d74805910 0x00007f9d7485a0eb Yes (*) /usr/lib/x86_64-linux-gnu/libgdk-x11-2.0.so.0
0x00007f9d745e5150 0x00007f9d745e6015 Yes (*) /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0
0x00007f9d743db8d0 0x00007f9d743e0216 Yes (*) /usr/lib/x86_64-linux-gnu/libpangocairo-1.0.so.0
0x00007f9d740ba7f0 0x00007f9d7413dbfb Yes (*) /usr/lib/x86_64-linux-gnu/libX11.so.6
0x00007f9d73e9d530 0x00007f9d73e9f766 Yes (*) /usr/lib/x86_64-linux-gnu/libXfixes.so.3
0x00007f9d73c83990 0x00007f9d73c8f183 Yes (*) /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0
0x00007f9d73980940 0x00007f9d73a3c016 Yes (*) /usr/lib/x86_64-linux-gnu/libcairo.so.2
0x00007f9d73753e30 0x00007f9d73765d10 Yes (*) /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0
0x00007f9d7340e790 0x00007f9d734d4a2b Yes (*) /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0
0x00007f9d731cd360 0x00007f9d731d554b Yes (*) /usr/lib/x86_64-linux-gnu/libpangoft2-1.0.so.0
0x00007f9d72f86e50 0x00007f9d72fa5f26 Yes (*) /usr/lib/x86_64-linux-gnu/libpango-1.0.so.0
0x00007f9d72d43cc0 0x00007f9d72d60819 Yes (*) /usr/lib/x86_64-linux-gnu/libfontconfig.so.1
0x00007f9d72b36650 0x00007f9d72b3ad38 Yes (*) /usr/lib/x86_64-linux-gnu/libffi.so.6
0x00007f9d728f87b0 0x00007f9d7291ecdf Yes (*) /lib/x86_64-linux-gnu/libpcre.so.3
0x00007f9d72630f80 0x00007f9d726b59a2 Yes (*) /usr/lib/x86_64-linux-gnu/libasound.so.2
0x00007f9d72404550 0x00007f9d72405733 Yes (*) /usr/lib/x86_64-linux-gnu/libpulse-simple.so.0
0x00007f9d721c5170 0x00007f9d721efcd8 Yes (*) /usr/lib/x86_64-linux-gnu/libpulse.so.0
0x00007f9d71fab580 0x00007f9d71fb4c7f Yes (*) /usr/lib/x86_64-linux-gnu/libXext.so.6
0x00007f9d71da0420 0x00007f9d71da4e20 Yes (*) /usr/lib/x86_64-linux-gnu/libXcursor.so.1
0x00007f9d71b9baf0 0x00007f9d71b9c3ec Yes (*) /usr/lib/x86_64-linux-gnu/libXinerama.so.1
0x00007f9d7198d1e0 0x00007f9d71996f12 Yes (*) /usr/lib/x86_64-linux-gnu/libXi.so.6
0x00007f9d71782c00 0x00007f9d71788632 Yes (*) /usr/lib/x86_64-linux-gnu/libXrandr.so.2
0x00007f9d7157dcd0 0x00007f9d7157ea8c Yes (*) /usr/lib/x86_64-linux-gnu/libXss.so.1
0x00007f9d71377f40 0x00007f9d7137a6f6 Yes (*) /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1
0x00007f9d711756e0 0x00007f9d71175896 Yes (*) /usr/lib/x86_64-linux-gnu/libwayland-egl.so.1
0x00007f9d70f6cb90 0x00007f9d70f70bcf Yes (*) /usr/lib/x86_64-linux-gnu/libwayland-client.so.0
0x00007f9d70d61200 0x00007f9d70d62c4c Yes (*) /usr/lib/x86_64-linux-gnu/libwayland-cursor.so.0
0x00007f9d70b29f10 0x00007f9d70b411d5 Yes (*) /usr/lib/x86_64-linux-gnu/libxkbcommon.so.0
0x00007f9d77845ae0 0x00007f9d77860490 Yes /lib64/ld-linux-x86-64.so.2
0x00007f9d708fd300 0x00007f9d7091b98d Yes (*) /usr/lib/x86_64-linux-gnu/libFLAC.so.8
0x00007f9d70439a40 0x00007f9d7043bff4 Yes (*) /usr/lib/x86_64-linux-gnu/libvorbisenc.so.2
0x00007f9d701fbdd0 0x00007f9d702132ed Yes (*) /usr/lib/x86_64-linux-gnu/libvorbis.so.0
0x00007f9d6fff1a70 0x00007f9d6fff5c25 Yes (*) /usr/lib/x86_64-linux-gnu/libogg.so.0
0x00007f9d6fde7a60 0x00007f9d6fded728 Yes (*) /usr/lib/x86_64-linux-gnu/libXrender.so.1
0x00007f9d6fbe3c40 0x00007f9d6fbe4618 Yes (*) /usr/lib/x86_64-linux-gnu/libXcomposite.so.1
0x00007f9d6f9e0b90 0x00007f9d6f9e149b Yes (*) /usr/lib/x86_64-linux-gnu/libXdamage.so.1
0x00007f9d6f748b90 0x00007f9d6f7b427d Yes (*) /usr/lib/x86_64-linux-gnu/libfreetype.so.6
0x00007f9d6f527620 0x00007f9d6f5335e5 Yes (*) /usr/lib/x86_64-linux-gnu/libxcb.so.1
0x00007f9d6f2804e0 0x00007f9d6f305c2c Yes (*) /usr/lib/x86_64-linux-gnu/libpixman-1.so.0
0x00007f9d6f053ab0 0x00007f9d6f06d003 Yes (*) /lib/x86_64-linux-gnu/libpng12.so.0
0x00007f9d6ee4dd80 0x00007f9d6ee4e5f3 Yes (*) /usr/lib/x86_64-linux-gnu/libxcb-shm.so.0
0x00007f9d6ec47430 0x00007f9d6ec49edf Yes (*) /usr/lib/x86_64-linux-gnu/libxcb-render.so.0
0x00007f9d6ea26b00 0x00007f9d6ea38a26 Yes (*) /lib/x86_64-linux-gnu/libselinux.so.1
0x00007f9d6e809ad0 0x00007f9d6e818eb9 Yes /lib/x86_64-linux-gnu/libresolv.so.2
0x00007f9d6e5b6f20 0x00007f9d6e5ea52a Yes (*) /usr/lib/x86_64-linux-gnu/libharfbuzz.so.0
0x00007f9d6e3a9c20 0x00007f9d6e3ad011 Yes (*) /usr/lib/x86_64-linux-gnu/libthai.so.0
0x00007f9d6e181b60 0x00007f9d6e19a6c9 Yes (*) /lib/x86_64-linux-gnu/libexpat.so.1
0x00007f9d6df27180 0x00007f9d6df6144a Yes (*) /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so
0x00007f9d6dd0e800 0x00007f9d6dd133cb Yes (*) /lib/x86_64-linux-gnu/libjson-c.so.2
0x00007f9d6dacd840 0x00007f9d6daf5d54 Yes (*) /lib/x86_64-linux-gnu/libdbus-1.so.3
0x00007f9d6d8c3e50 0x00007f9d6d8c4acc Yes (*) /usr/lib/x86_64-linux-gnu/libXau.so.6
0x00007f9d6d6be350 0x00007f9d6d6bfd6c Yes (*) /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
0x00007f9d6d499550 0x00007f9d6d4b548c Yes (*) /usr/lib/x86_64-linux-gnu/libgraphite2.so.3
0x00007f9d6d291170 0x00007f9d6d294274 Yes (*) /usr/lib/x86_64-linux-gnu/libdatrie.so.1
0x00007f9d6d088d70 0x00007f9d6d08c798 Yes (*) /lib/x86_64-linux-gnu/libwrap.so.0
0x00007f9d6ce81370 0x00007f9d6ce838e8 Yes (*) /usr/lib/x86_64-linux-gnu/libasyncns.so.0
0x00007f9d6cc6a160 0x00007f9d6cc76ea3 Yes /lib/x86_64-linux-gnu/libnsl.so.1
0x00007f9d6c78f010 0x00007f9d6c797750 Yes (*) /lib/x86_64-linux-gnu/libudev.so.1
0x00007f9d6c574610 0x00007f9d6c5866ec Yes (*) /lib/x86_64-linux-gnu/libcgmanager.so.0
0x00007f9d6c35d760 0x00007f9d6c36a12a Yes (*) /lib/x86_64-linux-gnu/libnih.so.1
0x00007f9d6c151ce0 0x00007f9d6c1554c6 Yes (*) /lib/x86_64-linux-gnu/libnih-dbus.so.1
0x00007f9d6b6fc2e0 0x00007f9d6b72b3a4 Yes (*) /usr/lib/x86_64-linux-gnu/libopenal.so
(*): Shared library is missing debugging information.
* Threads
Id Target Id Frame
2 Thread 0x7f9d6c14e700 (LWP 25536) "SDLTimer" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1 Thread 0x7f9d77a209c0 (LWP 25535) "vizdoom" 0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
* FPU Status
R7: Empty 0x00000000000000000000
R6: Empty 0x00000000000000000000
R5: Empty 0x00000000000000000000
R4: Empty 0x00000000000000000000
R3: Empty 0x00000000000000000000
R2: Empty 0x00000000000000000000
R1: Empty 0x00000000000000000000
=>R0: Empty 0x00000000000000000000
Status Word: 0x0000
TOP: 0
Control Word: 0x027f IM DM ZM OM UM PM
PC: Double Precision (53-bits)
RC: Round to nearest
Tag Word: 0xffff
Instruction Pointer: 0x7f9d:0x7542ceae
Operand Pointer: 0x7fff:0xd2cc4448
Opcode: 0x0000
* Registers
rax 0xfffffffffffffe00 -512
rbx 0x661b 26139
rcx 0xffffffffffffffff -1
rdx 0x0 0
rsi 0xbb9e20 12295712
rdi 0x661b 26139
rbp 0xbb9e20 0xbb9e20
rsp 0xbb9e00 0xbb9e00
r8 0x0 0
r9 0x1 1
r10 0x0 0
r11 0x246 582
r12 0xb 11
r13 0x1090 4240
r14 0xbb7300 12284672
r15 0x9 9
rip 0x7f9d7679bed9 0x7f9d7679bed9 <__libc_waitpid+105>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
* Backtrace
Thread 2 (Thread 0x7f9d6c14e700 (LWP 25536)):
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
No locals.
#1 0x00007f9d76a6b52e in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#2 0x00007f9d76a6b675 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#3 0x00007f9d76a20ba1 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#4 0x00007f9d76a2073d in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#5 0x00007f9d76a6b279 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#6 0x00007f9d76794184 in start_thread (arg=0x7f9d6c14e700) at pthread_create.c:312
__res = <optimized out>
pd = 0x7f9d6c14e700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140314099902208, 6620885331285598523, 0, 0, 140314099902912, 140314099902208, -6568284515656048325, -6568235264282107589}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#7 0x00007f9d74b9537d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
Thread 1 (Thread 0x7f9d77a209c0 (LWP 25535)):
#0 0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
resultvar = 26139
oldtype = 0
#1 0x000000000051fb2c in ?? ()
No symbol table info available.
#2 <signal handler called>
No locals.
#3 __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
No locals.
#4 0x0000000000808551 in VIZ_GameStateUpdateLabels() ()
No symbol table info available.
#5 0x000000000080aa4a in VIZ_Update() ()
No symbol table info available.
#6 0x000000000080b495 in VIZ_Tic() ()
No symbol table info available.
#7 0x000000000055709e in D_DoomLoop() ()
No symbol table info available.
#8 0x0000000000559b59 in D_DoomMain() ()
No symbol table info available.
#9 0x000000000050774f in main ()
No symbol table info available.
for ag in agents:
agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
thd = threading.Thread(target=(agent_train))
thd.start()
time.sleep(0.5)
agent_threads.append(thd)
coord.join(worker_threads, stop_grace_period_secs=60)
in ag.train()
...
a_index = np.argmax(a_policy_value == a)
r = self.env.make_action(self.actions[a_index], 4) / 400.0
...
Thank you!
Has it happened again? Cause I can't replicate this (using exactly the same settings as yours) so I'm missing some significant factor in my environment or this is really really rare.
@mwydmuch , Yes, almost every time, Would you mind me sending my code to you?
This will be very helpful, please send it to marek@wydmuch.poznan.pl Thank you
@mwydmuch , I sent code to you, thank you!
I faced the same problem when using multi-threading. Was the issue solved? I prefer to have the knowledge about it.
Hi,
I ran a A3C style training and ViZDoom crashed occasionally (The same lua/Torch code ran well with ALE environment across a wide range of games). The more threads I started, the faster it crashed... The error information is like:
and the vizdoom-crash.log is like: