Closed mandar5335 closed 4 years ago
forrtl: severe (174): SIGSEGV, segmentation fault occurred longjmp causes uninitialized stack frame : /home/m/mandar/pfs/softwares/new_tinker_openmm_gpu/gcc/tinker/bin/dynamic_omm terminated
Any other message below these lines? For example, traceback info?
Hi, Thanks for the reply. Please see detailed error below, This is all printed as an output:
forrtl: severe (174): SIGSEGV, segmentation fault occurred longjmp causes uninitialized stack frame : /home/m/mandar/pfs/softwares/new_tinker_openmm_gpu/gcc/tinker/bin/dynamic_omm terminated forrtl: severe (174): SIGSEGV, segmentation fault occurred ======= Backtrace: ========= forrtl: severe (174): SIGSEGV, segmentation fault occurred srun: error: b-cn1105: task 0: Exited with exit code 174
Thanks. I never compiled Openmm with Tinker8.7. I will try and see if there exists the same error. BTW, you said
have installed Tinker-openMM using CUDA 9.2, intel compilers,and recent Tinker 8.7+ Tinker-OpenMM
While your path indicates that you may use gcc?
/home/m/mandar/pfs/softwares/new_tinker_openmm_gpu/gcc/tinker/bin/dynamic_omm
Is there any mismatch in your compilation?
Hi, That's a misnomer, sorry for the confusion. I planned initially to install with gcc but then installed using icc. This installation is on cluster. I have purged all modules first and then loaded the intel compiler module. So, I am sure this installation uses icc/ifort.
We use the version of Tinker 8.7 (and thus the Tinker-OpenMM interface code in openmm/ommstuf.cpp) currently on GitHub, the version of OpenMM currently on GitHub as Tinker-OpenMM, CUDA 9.2 and the gcc/gfortran compilers. This combination works for us on both Linux and on MacOS.
@jayponder thanks for the suggestion. I will install Tinker/OpenMM with gcc/gfortran and will check whether error persists or not. I am using Tinker-OpenMM available on github. However, Tinker 8.7 github version does not contain "fftw" folder, so I downloaded Tinker-8.7.1 from https://dasher.wustl.edu/tinker/ Do these versions differ? If yes, in that case if I transfer "fftw" folder from Tinker-8.7.1 to Tinker-8.7_github_version, is it okay?
Thanks, Mandar Kulkarni
Hi everyone, i have tried to install Tinker with gnu compilers. (gcc and gfortran version 6.40 , cuda 9.1) First, I compiled fftw which was successful. then, i am facing error during "make" command.
TINKERDIR is correctly set. TINKERDIR =/home/m/mandar/pfs/softwares/gcccuda_tinker_openmm_gpu/source/tinker
My Makefile options are: F77 = gfortran F77FLAGS = -c OPTFLAGS = -Ofast -msse3 -fopenmp LIBDIR = -L. -L$(TINKER_LIBDIR)/linux LIBS = LIBFLAGS = -crusv RANLIB = ranlib LINKFLAGS = $(OPTFLAGS) -static-libgcc RENAME = rename_bin
error: /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'crystal.x': No such file Makefile:792: recipe for target 'crystal.x' failed make: [crystal.x] Error 1 make: Waiting for unfinished jobs.... /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'document.x': No such file Makefile:792: recipe for target 'document.x' failed make: *** [document.x] Error 1
Any suggestions will be really helpful. Thanks in advance.
Hi everyone, i have tried to install Tinker with gnu compilers. (gcc and gfortran version 6.40 , cuda 9.1) First, I compiled fftw which was successful. then, i am facing error during "make" command.
TINKERDIR is correctly set. TINKERDIR =/home/m/mandar/pfs/softwares/gcccuda_tinker_openmm_gpu/source/tinker
My Makefile options are: F77 = gfortran F77FLAGS = -c OPTFLAGS = -Ofast -msse3 -fopenmp LIBDIR = -L. -L$(TINKER_LIBDIR)/linux LIBS = LIBFLAGS = -crusv RANLIB = ranlib LINKFLAGS = $(OPTFLAGS) -static-libgcc RENAME = rename_bin
error: /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'crystal.x': No such file Makefile:792: recipe for target 'crystal.x' failed make: [crystal.x] Error 1 make: Waiting for unfinished jobs.... /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'document.x': No such file Makefile:792: recipe for target 'document.x' failed make: *** [document.x] Error 1
Any suggestions will be really helpful. Thanks in advance.
Please ignore the above error. I have compiled Tinker-OpenMM combination successfully.
First, I cloned Tinker from TinkerTools, then copied "fftw" folder from Tinker-8.7.1.tar.gz to this version and followed Lee-Ping Wang's instructions for GCC compiler.
"fftw" folder is missing in github repository. Please is it possible to add "fftw" folder? It will avoid confusion for future users.
However, I am facing a new error after job submission as follows:
Default OpenMM Plugin Directory : /home/m/mandar/pfs/softwares/gcccuda_tinker_openmm_gpu/tinkeropenmm_exec/plugins
terminate called after throwing an instance of 'OpenMM::OpenMMException' what(): There is no registered Platform called "CUDA"
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:87
/var/spool/slurmd/job7742669/slurm_script: line 37: 148973 Aborted (core dumped) /home/m/mandar/pfs/softwares/gcccuda_tinker_openmm_gpu/source/Tinker/bin/dynamic_omm test_rUU 100 2.0 0.5 2 300.0 5 > dimer.log
Thanks again, Mandar Kulkarni
Please try compile/library/link.make files in https://github.com/TinkerTools/Tinker/tree/release/linux/gfortran
From: Mandar Kulkarni notifications@github.com Sent: Thursday, October 3, 2019 8:32 AM To: TinkerTools/Tinker Tinker@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [TinkerTools/Tinker] intel compilers and SIGSEGV, segmentation fault (#52)
Hi everyone, i have tried to install Tinker with gnu compilers. (gcc and gfortran version 6.40 , cuda 9.1) First, I compiled fftw which was successful. then, i am facing error during "make" command.
TINKERDIR is correctly set. TINKERDIR =/home/m/mandar/pfs/softwares/gcccuda_tinker_openmm_gpu/source/tinker
My Makefile options are: F77 = gfortran F77FLAGS = -c OPTFLAGS = -Ofast -msse3 -fopenmp LIBDIR = -L. -L$(TINKER_LIBDIR)/linux LIBS = LIBFLAGS = -crusv RANLIB = ranlib LINKFLAGS = $(OPTFLAGS) -static-libgcc RENAME = rename_bin
error: /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'crystal.x': No such file Makefile:792: recipe for target 'crystal.x' failed make: [crystal.x] Error 1 make: Waiting for unfinished jobs.... /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to 'secure_getenv' /root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/io/unix.c:1208: error: undefined reference to '__secure_getenv' collect2: error: ld returned 1 exit status strip: 'document.x': No such file Makefile:792: recipe for target 'document.x' failed make: *** [document.x] Error 1
Any suggestions will be really helpful. Thanks in advance.
Please ignore the above error. I have compiled Tinker-OpenMM combination successfully.
First, I cloned Tinker from TinkerTools, then copied "fftw" folder from Tinker-8.7.1.tar.gz to this version and followed Lee-Ping Wang's instructions for GCC compiler.
"fftw" folder is missing in github repository. Please is it possible to add "fftw" folder? It will avoid confusion for future users.
Thanks again, Mandar Kulkarni
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/TinkerTools/Tinker/issues/52?email_source=notifications&email_token=ABNC6XV2JZQCMRH4IAOGVH3QMXX3LA5CNFSM4I42UBK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIGWXA#issuecomment-537946972, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABNC6XVV3OZ22NOUKXVLZJTQMXX3LANCNFSM4I42UBKQ.
This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401
I faced the same problem when compiled the code with intel compilers! Is there any clue or solution to this?
FFTW 3.3.8 has now been added to the Tinker distribution on GitHub. This is not our code, it is a very commonly used Fourier transform package from MIT. But we believe/hope that it is OK to directly package it with Tinker. See the 0README file in the top-level /fftw directory for Tinker specific instructions for building the FFTW libraries needed by Tinker.
I am still unsure what the problem mandar5335 is reporting above is due to. He says he is using GNU gcc/gfortran 6.40, but from the error messages it seems 4.9.2 is really being used. Since the error appears to be coming from the GNU gcc installation itself, I suspect it could be from some issue with the gcc/gfortran setup.
In a later comment, mandar5335 reports an error of "There is no registered Platform called CUDA" at runtime. This is almost certainly due to the fact that CUDA is not installed correctly on the machine, or (more likely) that the machine is not recognizing the GPU card itself. Please first check that your machine sees the GPU.
Recently, hongxiahao91 reports the "same problem" when using the Intel compilers. Which problem? (as there are several different problems in this thread...) Please provide more details of exactly the problem you are having.
Also, note that we do not recommend using the Intel compilers for building Tinker-OpenMM. While the Intel compilers do produce faster Tinker executables for CPUs (due mostly to a better implementation of OpenMP parallelization), there is no advantage for Tinker-OpenMM. Since all the intensive calculation is done on the GPU, and is really done by OpenMM and hence under CUDA, using the Intel compiler will not produce faster Tinker-OpenMM executables. And it may (?) be the case that if you build Tinker-OpenMM with Intel compilers, you will also have to build OpenMM itself with the same compilers. I would recommend that you just use a recent version of GNU gcc/gfortran for everything.
@jayponder Professor Ponder thanks a lot for providing comments on a problem and suggestions.
I have tried again installation of Tinker 8.7.2 and OpenMM combination. The same error still persist and I have raised an issue with our HPC management to make sure it is not gcc/gfortran setup issue. I am waiting for a response from their side.
Below are the first-hand observations when I tried to re-install the Tinker-OpenMM.
"/root/gcc-4.9.2/src/gcc-4.9.2/libgfortran/runtime/main.c:175: error: undefined reference to '__secure_getenv'"
I will update once I receive any response from cluster management team.
Thanks again, Mandar Kulkarni
Hello, I have successfully installed the Tinker-OpenMM combination with help from HPC management. It was not clear what caused problems earlier, but now I have a working executable dynamic_omm.
But, I am facing another issue. I am benchmarking the DHFR system right now. When I try on 2 nodes (28 procs per node, k80 GPU, 2 GPUS on each node), I get a speed of 1.0417 ns/day.
Performance: ns/day 1.0417 Wall Time 8.2940 Steps 100 Updates 1 Time Step 1.0000 Atoms 23558 Threads 56
and when I try on 224 processors, the speed is still the same.
Performance: ns/day 1.0733 Wall Time 8.0500 Steps 100 Updates 1 Time Step 1.0000 Atoms 23558 Threads 224
I have added export CUDA_VISIBLE_DEVICES=0,1,2,3 line in the job script. Please, could you suggest how can improve simulation speed, if using multiple GPU nodes?
Thanks again, Mandar Kulkarni
Last I knew, the Amoeba kernels in OpenMM didn't support parallel execution, which means this behavior is to be expected.
Basically one GPU does all the work while the others wait for it to finish. If you want to maximize throughput on multiple GPUs, run a separate simulation for each GPU. The aggregate sampling will be maximally increased with that approach.
@swails Thanks for your reply. It means I can run on a single node with GPU.
Hi, I have installed Tinker-openMM using CUDA 9.2, intel compilers,and recent Tinker 8.7+ Tinker-OpenMM (from github). The installation completed without errors and it is based on Lee-Ping's instructions.
But, whenever I try jobs I receive following error:
forrtl: severe (174): SIGSEGV, segmentation fault occurred longjmp causes uninitialized stack frame : /home/m/mandar/pfs/softwares/new_tinker_openmm_gpu/gcc/tinker/bin/dynamic_omm terminated
Is there any variable I should set to avoid this error? Thanks in advance.
Regards, Mandar Kulkarni