CEA-LIST / N2D2

N2D2 is an open source CAD framework for Deep Neural Network simulation and full DNN-based applications building.
Other
146 stars 35 forks source link

Segfault with CPP exported model [bad memory wrapping] #84

Closed olivierbichler-cea closed 3 years ago

olivierbichler-cea commented 3 years ago

I have the same problem, the -O3 or -O2 version crashes when run from the terminal. strace gives

clone(child_stack=0x7f9f3345df30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f9f3345e9d0, tls=0x7f9f3345e700, child_tidptr=0x7f9f3345e9d0) = 2492
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9f3245d000
mprotect(0x7f9f3245d000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f9f32c5cf30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f9f32c5d9d0, tls=0x7f9f32c5d700, child_tidptr=0x7f9f32c5d9d0) = 2493
futex(0x26cb5a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x26cbfc4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x26cb5a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x26cb5a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x26cb5a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x26cb5a4, FUTEX_WAKE_PRIVATE, 2147483647) = ? <unavailable>

There are 7 clone syscalls (on a 8 core machine) so i suspect the crash is immediately after starting openmp threads.

It is very tricky since when running the release version under gdb there is no crash, only 0% correct classification . When compiling without optimization and in debug (-O0 -g) it works fine.

Originally posted by @andreistoian in https://github.com/CEA-LIST/N2D2/issues/83#issuecomment-755168441

olivierbichler-cea commented 3 years ago

I created a separate issue for this problem, which is distinct from the original one.

This should be fixed in the latest commit, can you check please?

andreistoian commented 3 years ago

Sorry, it still crashes:

==10374== Invalid read of size 4
==10374==    at 0x402F3D: macsOnRange<192, 1, 1, float> (Network.hpp:451)
==10374==    by 0x402F3D: void N2D2::Network::convcellPropagate<64, 1, 495, 128, 1, 493, 0, 0, 1, 1, 1, 3, (ActivationFunction_T)6, 102880, 24960, 0, 6720, 64, 6720, 63104, 0, 0, 128, float, float, N2D2::NoScaling>(float const*, float*, float const*, float const*, N2D2::NoScaling const&) const [clone ._omp_fn.8] (Network.hpp:743)
==10374==    by 0x54D443D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10374==    by 0x59046B9: start_thread (pthread_create.c:333)
==10374==    by 0x5C2141C: clone (clone.S:109)
==10374==  Address 0x258d000 is not stack'd, malloc'd or (recently) free'd

I'll resend the export dir

olivierbichler-cea commented 3 years ago

Ok, thanks I will have a look.

In the meantime, I suggest you disable the memory wrapping in the export by setting OptimizeBufferMemory=0 in the export parameter file. In order to do so, just create a .ini file containing OptimizeBufferMemory=0 and run N2D2 with the extra command line parameter during export:

./n2d2 ... -export-parameters filename.ini

olivierbichler-cea commented 3 years ago

Hi, This issue has been fixed in the latest commit. Please feel free to re-open it if necessary. Thanks.