NESTCollaboration / nest

Noble Element Simulation Technique is used to simulate noble-element energy deposition microphysics.
http://nest.physics.ucdavis.edu
Other
23 stars 42 forks source link

pulseShape script segmentation fault #100

Closed 0rhisia0 closed 3 years ago

0rhisia0 commented 3 years ago

The pulseShape script seems to be seg faulting when run on the photon_times.txt execNest output. I'm running execNest with verbosity=true and useTiming=1. photon_times was generated using ./execNest 30000 beta 0 15 300 -1 123 . The valgrind output on the code is as follows:

==2345== Command: ./pulseShape 10 80                                                                                    
==2345==                                                                                                                
==2345== Invalid read of size 4                                                                                        
==2345==    at 0x5751601: getc (in /usr/lib64/libc-2.17.so)                                                             
==2345==    by 0x4010F6: main (in /local/users/ishira/NEST/build/examples/pulseShape)                                   
==2345==  Address 0x0 is not stack'd, malloc'd or (recently) free'd    

I've been trying to debug this for a bit and I'm not entirely sure where the error is stemming from. It's potentially reading from a reallocated pointer but I can't seem to figure out exactly where.

mszydagis commented 3 years ago

I believe the issue is that you are specifying an unphysical position. The syntax for execNEST is ./execNEST numEvts type_interaction E_min[keV] E_max[keV] field_drift[V/cm] x,y,z-position[mm]" You seem to be putting a field of -1 and position of 123, which the code does not know how to interpret because you should be creating three numbers for the position, separated by commas. Also, what detector are you using? (Lastly, congrats on having the 100th NEST issue!)

0rhisia0 commented 3 years ago

Sorry I'm not entirely sure if I follow. ./execNest 30000 beta 0 15 300 -1 123 mapped to the syntax you quote would give: numEvts=30000, type_interaction=beta, E_min=0, E_max=15, field_drift=300, x,y,z-position=-1 (random) seed = 123. This should be correct yes?

Edit: Woohoo issue 100!

mszydagis commented 3 years ago

Yes, sorry, I somehow misread your input arguments. Indeed they are all OK. However, I have run exactly what you did, on both my desktop machine (Linux) and my personal laptop (Mac). No seg fault, no errors of any kind. I took out a fresh copy of NEST (v2.2.1patch1, is that what you have?) and made only 1 change: useTiming altered from default of 0 -> 1. Runs perfectly and outputs a scintillation prompt fraction of median 0.60263 sigma 0.0953254, and skew -0.113486 (which all sound very reasonable for betas in this energy range). I ran execNEST with your exact options, followed by pulseShape. The only thing I can think of: pulseShape executable should be in your examples subfolder, so that "./" in place of "examples/" is what is making me suspicious at the moment. (for execNEST, running it from its directory with ./ is fine)

0rhisia0 commented 3 years ago

I've done several fresh clones of the V2.2.1 patch1 and made the same exact change several times now and it still gives me the same error (I forgot to mention I'm using the LUX_Run03 template, which is the default I believe). I cd into the examples directory before running pulseShape so that shouldn't be a problem either. Since it runs on your end I can only imagine it's an issue with the compiler settings on my machine? I've also tried compiling the script separately using g++ -O/-O2/-Ofast but it's still throwing the error. I'll play around with it more and let you know if anything surfaces.

mszydagis commented 3 years ago

I am lost; I have never before failed to reproduce someone's bug and get it fixed right away. Should not be a machine-dependent issue, because we use Travis CI to test numerous configurations. The only thing I can think of now is a RAM problem: does this error still happen if you reduce the number of events from 30,000 downward and downward and downward? (If that solves it, it tells me you will unfortunately need to run in smaller chunks and merge the results later)

mszydagis commented 3 years ago

I wanted to check in with you on this since I could not reproduce the problem, but we are about to issue a new patch (so, I am trying to close all currently-open issues). Perhaps my student @grischbieter can help you?

grischbieter commented 3 years ago

I'm unable to reproduce this issue. I've tried running on three different machines, and each one runs without any problems. To be very specific, I've started with clean versions of the code, only changed one line in analysis.hh, useTiming = 1;, run execNEST with ./execNEST 30000 beta 0 15 300 -1 123 and then ran ./pulseShape 10 80. Works fine on each machine. Since we can't reproduce this issue, I'll close it. But @0rhisia0 , please let us know if you continue having problems.