TinkerTools / tinker

Tinker: Software Tools for Molecular Design
https://dasher.wustl.edu/tinker/
Other
130 stars 61 forks source link

Tinker-openMM GPU: Segmentation Fault (Core Dumped) #55

Closed aneesmkp closed 4 years ago

aneesmkp commented 4 years ago

I am getting Segmentation Fault (core Dumped) while trying to run the dynamic_omm . The input files are working fine with the CPU version of dynamic. I have followed Lee-Ping Wang's notes while compiling. I am using CUDA-10.2 on RTX-2070 (Ubuntu 18.04). I also tried to run dynamic with just a water box (https://biomol.bme.utexas.edu/~pren/downloads/waterbox) and still i am getting the same error.

ERROR OBTAINED FOR THE SIMULATION of WATER BOX: anees@basin:~/work/tinker_test/amoebanuc17_solv_noions$ /home/anees/src/Tinker/build_openmm/dynamic_omm.x water36.xyz

 ######################################################################

##########################################################################

Tinker --- Software Tools for Molecular Design

Version 8.7 June 2019

Copyright (c) Jay William Ponder 1990-2019

All Rights Reserved

########################################################################## ######################################################################

Enter the Number of Dynamics Steps to be Taken : 1000

Enter the Time Step Length in Femtoseconds [1.0] : 2.0

Enter Time between saves in Picoseconds [0.1] : 0.1

Available Statistical Mechanical Ensembles :

(1) Microcanonical (NVE)
(2) Canonical (NVT)
(3) Isoenthalpic-Isobaric (NPH)
(4) Isothermal-Isobaric (NPT)

Enter the Number of the Desired Choice [1] : 2

Enter the Desired Temperature in Degrees K [298] : 300

Return Data from the GPU at Every Time Step [N] : 1000

Number of CUDA Devices Detected : 1

Device Number : 0 Device Name GeForce RTX 2070 Clockspeed (GHz) 1.620 Total Memory (GB) 8.00 Free Memory (GB) 6.77 GPU load 1.00%

Platform CUDA : Setting Precision to MIXED via CUDA-PRECISION

Molecular Dynamics Trajectory via r-RESPA MTS Algorithm terminate called after throwing an instance of 'OpenMM::OpenMMException' what(): Error loading CUDA module: CUDA error (218)

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

0 0x7f6e51c5231a

1 0x7f6e51c51503

2 0x7f6e51098f1f

3 0x7f6e51098e97

4 0x7f6e5109a800

5 0x7f6e53b2e956

6 0x7f6e53b34ab5

7 0x7f6e53b34af0

8 0x7f6e53b34d78

9 0x7f6e184992e2

10 0x7f6e1855765b

11 0x7f6e18557f77

12 0x7f6e184e9cfd

13 0x7f6e54165483

14 0x7f6e18502094

15 0x7f6e541a519f

16 0x55d2aefc248b

17 0x55d2aefc135e

18 0x7f6e5107bb96

19 0x55d2aefc13e9

20 0xffffffffffffffff

Aborted (core dumped)

ERROR OBTAINED FOR THE SIMULATION I WAS TRYING: anees@basin:~/work/tinker_test/amoebanuc17_solv_noions$ /home/anees/src/Tinker/build_openmm/dynamic_omm.x scl_Na_solv_output.xyz

 ######################################################################

##########################################################################

Tinker --- Software Tools for Molecular Design

Version 8.7 June 2019

Copyright (c) Jay William Ponder 1990-2019

All Rights Reserved

########################################################################## ######################################################################

Enter Potential Parameter File Name : amoebanuc17.prm

Enter the Number of Dynamics Steps to be Taken : 100

Enter the Time Step Length in Femtoseconds [1.0] : 2.0

Enter Time between saves in Picoseconds [0.1] : 0.01

Available Simulation Control Modes :

(1) Constant Total Energy Value (E)
(2) Constant Temperature via Thermostat (T)

Enter the Number of the Desired Choice [1] : 2

Enter the Desired Temperature in Degrees K [298] : 298

Return Data from the GPU at Every Time Step [N] : 10

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

0 0x7f322a5b531a

1 0x7f322a5b4503

2 0x7f32299fbf1f

3 0x7f322cae9082

4 0x7f322cc9190b

5 0x56072cc3e46b

6 0x56072cc45a5d

7 0x56072cc3bf8d

8 0x56072cc3b35e

9 0x7f32299deb96

10 0x56072cc3b3e9

11 0xffffffffffffffff

Segmentation fault (core dumped)

I have tried the suggestion that came up here: https://github.com/TinkerTools/Tinker/issues/52. Still its not working.

-Anees

pren commented 4 years ago

Yes this is a known issue. Cuda 10.0 works fine on RTX card if you can compile tinker-openmm with it.

From: aneesmkp notifications@github.com Sent: Monday, March 9, 2020 2:52 PM To: TinkerTools/Tinker Tinker@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [TinkerTools/Tinker] Tinker-openMM GPU: Segmentation Fault (Core Dumped) (#55)

I am getting Segmentation Fault (core Dumped) while trying to run the dynamic_omm . The input files are working fine with the CPU version of dynamic. I have followed Lee-Ping Wang's notes while compiling. I am using CUDA-10.2 on RTX-2070 (Ubuntu 18.04). I also tried to run dynamic with just a water box (https://biomol.bme.utexas.edu/~pren/downloads/waterbox) and still i am getting the same error.

ERROR OBTAINED FOR THE SIMULATION of WATER BOX: anees@basin:~/work/tinker_test/amoebanuc17_solv_noions$ /home/anees/src/Tinker/build_openmm/dynamic_omm.x water36.xyz

######################################################################

##########################################################################

Tinker --- Software Tools for Molecular Design Version 8.7 June 2019 Copyright (c) Jay William Ponder 1990-2019 All Rights Reserved

########################################################################## ######################################################################

Enter the Number of Dynamics Steps to be Taken : 1000

Enter the Time Step Length in Femtoseconds [1.0] : 2.0

Enter Time between saves in Picoseconds [0.1] : 0.1

Available Statistical Mechanical Ensembles :

(1) Microcanonical (NVE)

(2) Canonical (NVT)

(3) Isoenthalpic-Isobaric (NPH)

(4) Isothermal-Isobaric (NPT)

Enter the Number of the Desired Choice [1] : 2

Enter the Desired Temperature in Degrees K [298] : 300

Return Data from the GPU at Every Time Step [N] : 1000

Number of CUDA Devices Detected : 1

Device Number : 0 Device Name GeForce RTX 2070 Clockspeed (GHz) 1.620 Total Memory (GB) 8.00 Free Memory (GB) 6.77 GPU load 1.00%

Platform CUDA : Setting Precision to MIXED via CUDA-PRECISION

Molecular Dynamics Trajectory via r-RESPA MTS Algorithm terminate called after throwing an instance of 'OpenMM::OpenMMException' what(): Error loading CUDA module: CUDA error (218)

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

0 0x7f6e51c5231a

1https://github.com/TinkerTools/Tinker/issues/1 0x7f6e51c51503

2https://github.com/TinkerTools/Tinker/pull/2 0x7f6e51098f1f

3https://github.com/TinkerTools/Tinker/issues/3 0x7f6e51098e97

4https://github.com/TinkerTools/Tinker/pull/4 0x7f6e5109a800

5https://github.com/TinkerTools/Tinker/pull/5 0x7f6e53b2e956

6https://github.com/TinkerTools/Tinker/pull/6 0x7f6e53b34ab5

7https://github.com/TinkerTools/Tinker/issues/7 0x7f6e53b34af0

8https://github.com/TinkerTools/Tinker/pull/8 0x7f6e53b34d78

9https://github.com/TinkerTools/Tinker/pull/9 0x7f6e184992e2

10https://github.com/TinkerTools/Tinker/pull/10 0x7f6e1855765b

11https://github.com/TinkerTools/Tinker/pull/11 0x7f6e18557f77

12https://github.com/TinkerTools/Tinker/pull/12 0x7f6e184e9cfd

13https://github.com/TinkerTools/Tinker/pull/13 0x7f6e54165483

14https://github.com/TinkerTools/Tinker/pull/14 0x7f6e18502094

15https://github.com/TinkerTools/Tinker/pull/15 0x7f6e541a519f

16https://github.com/TinkerTools/Tinker/pull/16 0x55d2aefc248b

17https://github.com/TinkerTools/Tinker/pull/17 0x55d2aefc135e

18https://github.com/TinkerTools/Tinker/pull/18 0x7f6e5107bb96

19https://github.com/TinkerTools/Tinker/pull/19 0x55d2aefc13e9

20https://github.com/TinkerTools/Tinker/issues/20 0xffffffffffffffff

Aborted (core dumped)

ERROR OBTAINED FOR THE SIMULATION I WAS TRYING: anees@basin:~/work/tinker_test/amoebanuc17_solv_noions$ /home/anees/src/Tinker/build_openmm/dynamic_omm.x scl_Na_solv_output.xyz

######################################################################

##########################################################################

Tinker --- Software Tools for Molecular Design Version 8.7 June 2019 Copyright (c) Jay William Ponder 1990-2019 All Rights Reserved

########################################################################## ######################################################################

Enter Potential Parameter File Name : amoebanuc17.prm

Enter the Number of Dynamics Steps to be Taken : 100

Enter the Time Step Length in Femtoseconds [1.0] : 2.0

Enter Time between saves in Picoseconds [0.1] : 0.01

Available Simulation Control Modes :

(1) Constant Total Energy Value (E)

(2) Constant Temperature via Thermostat (T)

Enter the Number of the Desired Choice [1] : 2

Enter the Desired Temperature in Degrees K [298] : 298

Return Data from the GPU at Every Time Step [N] : 10

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

0 0x7f322a5b531a

1https://github.com/TinkerTools/Tinker/issues/1 0x7f322a5b4503

2https://github.com/TinkerTools/Tinker/pull/2 0x7f32299fbf1f

3https://github.com/TinkerTools/Tinker/issues/3 0x7f322cae9082

4https://github.com/TinkerTools/Tinker/pull/4 0x7f322cc9190b

5https://github.com/TinkerTools/Tinker/pull/5 0x56072cc3e46b

6https://github.com/TinkerTools/Tinker/pull/6 0x56072cc45a5d

7https://github.com/TinkerTools/Tinker/issues/7 0x56072cc3bf8d

8https://github.com/TinkerTools/Tinker/pull/8 0x56072cc3b35e

9https://github.com/TinkerTools/Tinker/pull/9 0x7f32299deb96

10https://github.com/TinkerTools/Tinker/pull/10 0x56072cc3b3e9

11https://github.com/TinkerTools/Tinker/pull/11 0xffffffffffffffff

Segmentation fault (core dumped)

I have tried the suggestion that came up here: #52https://github.com/TinkerTools/Tinker/issues/52. Still its not working.

-Anees

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/TinkerTools/Tinker/issues/55?email_source=notifications&email_token=ABNC6XQ6UO4RL3XPIBOS7LTRGVJIDA5CNFSM4LEQGVU2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ITV424Q, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABNC6XX2RIBG6USWQURVIPLRGVJIDANCNFSM4LEQGVUQ.

This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401

jayponder commented 4 years ago

Actually, I have this working correctly on a very similar combination to the original poster: Ubuntu 18.04, CUDA 10.2, an RTX 2070 MaxQ (mobile version of 2070), and the current Tinker and Tinker-OpenMM from here on the TinkerTools Github site. Though as Pengyu notes, I think various people have had trouble with CUDA 10.2 and RTX cards. As he suggests, you should try CUDA 10.0. Also, make sure to use the keyword "integrator RESPA" or "integrator VERLET", as not all integrators that work in the CPU code are supported in Tinker-OpenMM. In particular, the "default" BEEMAN integrator from the CPU code is not supported in Tinker-OpenMM. For AMOEBA, we generally recommend using the RESPA integrator with a 2 fs time step.

jayponder commented 4 years ago

I'm going to close this issue. If there are still problems under this topic, please post to the issue to the Tinker-OpenMM package.