JeffersonLab / halld_sim

Simulation for the GlueX Experiment in Hall D
6 stars 10 forks source link

genr8 requires X11 forwarding?! #203

Closed T-Britton closed 2 years ago

T-Britton commented 3 years ago

the following commands run on scosg16 (probably other places too)

1) gxenv /group/halld/www/halldweb/html/halld_versions/version_4.42.0.xml

2) cd /osgpool/halld/tbritton/TestProj_2006/DEBUG/

3) genr8 -d -r030496 -M500 -Agenr8_030496_000.ascii < genr8_030496_000.conf

stalls when running as tbritton but not as ppauli. Turns out Laura Hild discovered that it doesn't hang when X11 forwarding is on. Peter was using VNC, when he stops using VNC it hangs for him (he will provide more details).

There, to my mind, is zero reason for X11 forwarding to be required by genr8. Unless I can trick genr8 into believing it has X11 when it doesn't then at least some configurations of genr8 cannot be processed by MCwrapper-bot

s6pepaul commented 3 years ago

As Thomas said, it works fine when I vnc onto a JLab machine, open a terminal, and then run the above. But when I open a terminal on my MacBook and ssh to scosg16 I get:

scosg16.jlab.org> cd /osgpool/halld/tbritton/TestProj_2006/DEBUG/ scosg16.jlab.org> gxenv /group/halld/www/halldweb/html/halld_versions/version_4.42.0.xml scosg16.jlab.org> genr8 -d -r030496 -M500 -Agenr8_030496_000.ascii < genr8_030496_000.conf Using runNo: 30496 Maximum number of events: 500 Opening file genr8_030496_000.ascii for output. Setting random number seed to: 1629847946

BeamProperties: Parsing config file genr8_030496_000_beam.conf BeamProperties WARNING: generateCombrems parameter ElectronBeamEnergy missing: using default = 11.6 BeamProperties WARNING: generateCombrems parameter CoherentPeakEnergy missing: using default = 8.8 BeamProperties WARNING: generateCombrems parameter Emittance missing: using default = 2.5e-09 BeamProperties WARNING: generateCombrems parameter RadiatorThickness missing: using default = 5e-05 BeamProperties WARNING: generateCombrems parameter CollimatorDiameter missing: using default = 0.005 BeamProperties WARNING: generateCombrems parameter CollimatorDistance missing: using default = 76

Initialization for coherent bremsstralung calculation electron beam energy: 11.6 GeV primary coherent edge: 8.8 GeV

BeamProperties: Using fixed polarization = 0.4 Reading: targetp.x targetp.y targetp.z targetMass Found: 0.000000 0.000000 0.000000 0.938272 Reading: t-channelSlope Found: 1.613000 Reading: number of particles need to describe the decay Found: 8 Reading: part# chld1# chld2# prnt# Id nchld mass width chrg flag
Found: 0 6 7 -1 82 2 1.232000 0.117000 2 0 Found: 1 2 3 -1 0 2 1.318100 0.109800 -1 1 Found: 2 2 3 1 12 0 0.494000 0.000000 -1 11 Found: 3 4 5 1 16 2 0.497000 0.000000 0 1 Found: 4 4 5 3 8 0 0.139500 0.000000 1 11 Found: 5 4 5 3 9 0 0.139500 0.000000 -1 11 Found: 6 4 5 0 8 0 0.139500 0.000000 1 11 Found: 7 4 5 0 14 0 0.938000 0.000000 1 11 Found EOI---- Input File appears Fine. In main do loop ... 0 ^C

I ctrl+c'd after a while because nothing happened.

rjones30 commented 3 years ago

I cannot reproduce any connection of this hang to X11. Here is what I get from a plain ssh session, no vnc, running on scosg16 inside the container. No problems seen. Test script genr8.sh and input file genr8.in are in /home/jonesrt on cue.

scosg16.jlab.org> ./osg-container.sh bash ./genr8.sh xxx.hddm Running process on scosg16 Using runNo: 71500 Maximum number of events: 1000000 Opening file genr8.ascii for output. Setting random number seed to: 675281712 Reading: beamp.x beamp.y beamp.z beamMass Found: 0.000000 0.000000 8.500000 0.000000 Reading: targetp.x targetp.y targetp.z targetMass Found: 0.000000 0.000000 0.000000 0.938272 Reading: t-channelSlope Found: 3.210000 Reading: number of particles need to describe the decay Found: 6 Reading: part# chld1# chld2# prnt# Id nchld mass width chrg flag Found: 0 -1 -1 -1 11 0 0.493677 0.000000 1 11 Found: 1 2 5 -1 20 2 1.192600 0.008000 0 0 Found: 2 3 4 1 18 2 1.115600 0.003000 0 1 Found: 3 3 4 2 9 0 0.139600 0.000000 -1 11 Found: 4 3 4 2 14 0 0.938200 0.000000 1 11 Found: 5 3 4 1 1 0 0.000000 0.000000 0 10 Found EOI---- Input File appears Fine. Max Lorentz Factor:0.019258 Events generated:1399195 Events accepted:1000000

Wrote 1000000 events to genr8.hddm scosg16.jlab.org> xterm xterm: Xt error: Can't open display: xterm: DISPLAY is not set

On Wed, Aug 4, 2021 at 11:12 AM Peter Pauli @.***> wrote:

As Thomas said, it works fine when I vnc onto a JLab machine, open a terminal, and then run the above. But when I open a terminal on my MacBook and ssh to scosg16 I get:

scosg16.jlab.org> cd /osgpool/halld/tbritton/TestProj_2006/DEBUG/ scosg16.jlab.org> gxenv /group/halld/www/halldweb/html/halld_versions/version_4.42.0.xml scosg16.jlab.org> genr8 -d -r030496 -M500 -Agenr8_030496_000.ascii < genr8_030496_000.conf Using runNo: 30496 Maximum number of events: 500 Opening file genr8_030496_000.ascii for output. Setting random number seed to: 1629847946

BeamProperties: Parsing config file genr8_030496_000_beam.conf BeamProperties WARNING: generateCombrems parameter ElectronBeamEnergy missing: using default = 11.6 BeamProperties WARNING: generateCombrems parameter CoherentPeakEnergy missing: using default = 8.8 BeamProperties WARNING: generateCombrems parameter Emittance missing: using default = 2.5e-09 BeamProperties WARNING: generateCombrems parameter RadiatorThickness missing: using default = 5e-05 BeamProperties WARNING: generateCombrems parameter CollimatorDiameter missing: using default = 0.005 BeamProperties WARNING: generateCombrems parameter CollimatorDistance missing: using default = 76

Initialization for coherent bremsstralung calculation electron beam energy: 11.6 GeV primary coherent edge: 8.8 GeV

BeamProperties: Using fixed polarization = 0.4 Reading: targetp.x targetp.y targetp.z targetMass Found: 0.000000 0.000000 0.000000 0.938272 Reading: t-channelSlope Found: 1.613000 Reading: number of particles need to describe the decay Found: 8 Reading: part# chld1# chld2# prnt# Id nchld mass width chrg flag Found: 0 6 7 -1 82 2 1.232000 0.117000 2 0 Found: 1 2 3 -1 0 2 1.318100 0.109800 -1 1 Found: 2 2 3 1 12 0 0.494000 0.000000 -1 11 Found: 3 4 5 1 16 2 0.497000 0.000000 0 1 Found: 4 4 5 3 8 0 0.139500 0.000000 1 11 Found: 5 4 5 3 9 0 0.139500 0.000000 -1 11 Found: 6 4 5 0 8 0 0.139500 0.000000 1 11 Found: 7 4 5 0 14 0 0.938000 0.000000 1 11 Found EOI---- Input File appears Fine. In main do loop ... 0 ^C

I ctrl+c'd after a while because nothing happened.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/halld_sim/issues/203#issuecomment-892741702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YKWC6IGUI4QWCIT3O7HTT3FKFLANCNFSM5BRLM2SQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

sdobbs commented 3 years ago

It sounds like this might be a problem with the input file (although with some weird behavior) - I can get other input files to run just fine with genr8 without X11 forwarding on, and if you run this in a debugger, you can see it's just stuck in an internal loop.

Also, note that you should never decay a long-lived particle like a Kshort in genr8 - these decays should be handled by Geant.

sdobbs commented 3 years ago

I had luck with the following configuration, but the main trick seems to be switching the order in which you specify the children of the baryon and meson decays:

%%%%%%%%%%%%%%%%% Start Input Values %%%%%%%%%%%%%%%%%%%%                                                                                                                                     
% beamp.x beamp.y beamp.z beamMass                                                                                                                                                            
%0 0 8.5 0                                                                                                                                                                                    
% beam configuration file                                                                                                                                                                     
genr8_030496_000_beam.conf                                                                                                                                                                    
% targetp.x targetp.y targetp.z targetMass                                                                                                                                                    
0 0 0 0.938272                                                                                                                                                                                
% t-channelSlope                                                                                                                                                                              
      1.613                                                                                                                                                                                   
% number of particles needed to describe the isobar decay of X                                                                                                                                
6                                                                                                                                                                                             
%                                                                                                                                                                                             
% particle# 0&1 are always the X&Y                                                                                                                                                            
%part#  chld1#  chld2#  parent# Id     nchild   mass       width   charge  flag                                                                                                               
   % baryon (Y) decay                                                                                                                                                                         
 0       2      3       *        0       2      1.232       0.117      +2      00                                                                                                             
   % meson (X) decay                                                                                                                                                                          
 1       4      5       *        0       2      1.3181      0.1098     -1      00                                                                                                             
 2       *      *       0        8       0      0.1395      0          +1      11                                                                                                             
 3       *      *       0       14       0      0.938       0          +1      11                                                                                                             
 4       *      *       1       12       0      0.494       0          -1      11                                                                                                             
 5       *      *       1       16       0      0.497       0          +0      01                                                                                                             
!EOI                                                                                                                                                                                          
s6pepaul commented 3 years ago

But surely this cannot be the intended behaviour. How can the user be sure that the config file works, if it runs successfully in one case (e.g. in a vnc session) and then it goes into an endless loop in another case. This came up because one users project didn't pass the test to be submitted to the OSG, but I wasn't able to reproduce the error, because the config file ran absolutely fine for me.

sdobbs commented 3 years ago

Right, certainly there's a bug somewhere. My guess is that it's memory related, given the particular circumstances in which it's shown up. My point was more that since the bug seems to show up due to the ordering of the decay particles in the input file, there seems to be a fairly straightforward path for debugging, for whoever wants to follow up on this.

(and a resolution for the user in the meantime)

T-Britton commented 3 years ago

maybe I am being slow on the uptake but..... If the order is the driver for the stall then why would Peter's via VNC vs via terminal change the behavior? what's the hypothesis for this? It doesn't strictly seem like memory because how would access change scosg16's memory for the two cases

sdobbs commented 3 years ago

So, when the program hangs, if you run it in a debugger, you can clearly see that it's stuck in an infinite loop in setMass(), e.g.

(gdb) bt
#0  setMass (Isobar=0x7fffffffa338) at programs/Simulation/genr8/genr8.cc:903
#1  0x000000000041099a in main (argc=, argv=) at programs/Simulation/genr8/genr8.cc:532

I don't know why this is the case, but the code clearly has nothing to do with X11, and it's not directly linked to any X11 libraries - the only graphics related things are from the ROOT libraries. So I don't know why exactly this is happening. I was guessing it's something memory related since (1) that's a common problem in C; (2) especially when reading in from text files; (3) if you're writing over and accessing non-allocated memory lots of weird stuff can happen. Maybe in the two cases you're looking at, libraries got loaded into different places in virtual memory, since maybe some things got loaded or not depending on if X11 forwarding is active. I don't know.

But this is pure speculation. It at least seems pretty clear that the program is being stuck in an infinite loop in this particular place, so if you want to resolve the problem you can compare the program execution for the two different input files.

maltealbrecht commented 2 years ago

I just came across the exact same problem, tried to chase it down and ended up with the same conclusions as discussed in this thread (which I found when I decided to post this problem as an issue...). My observation was also that for some weird reason the order of the baryon / meson daughter particles changes the behaviour and makes genr8 work, or get stuck at the first event. I can also confirm, that the order does not matter any more when using X11 forwarding.

Maybe this is an issue of memory layout or partitioning, depending on whether X11 gets loaded or not. It may be helpful to run genr8 in valgrind to find possible memory issues.

mashephe commented 2 years ago

To follow up a little... a student in our group got really hung up with this problem over the weekend and it burned a lot of time. I only had a very short time today to look at it, but I was able to reproduce the problem easily and I went one step further to run in valgrind.

When you run in valgrind there are some errors just prior to the hang that is consistent with what @sdobbs notes above:

------- this is without X11 forwarding.. the execution hangs and I have to ctrl-c to recover

Opening file test.txt for output. 
Using runNo: 50786
Using 100 events to determine the lorentz factor
Setting random number seed to: 1661389070
Reading:    beamp.x     beamp.y     beamp.z     beamMass
Found:      0.000000    0.000000    8.600000    0.000000 
Reading:    targetp.x   targetp.y   targetp.z   targetMass
Found:      0.000000    0.000000    0.000000    0.938720 
Reading: t-channelSlope
Found:  5.000000 
Reading: number of particles need to describe the decay
Found:  6 
Reading:    part#   chld1#  chld2#  prnt#   Id  nchld   mass        width       chrg    flag  
Found:      0   2   3   -1  0   2   1.232000    0.117000    2   0
Found:      1   4   5   -1  0   2   1.318000    0.107000    -1  0
Found:      2   4   5   0   14  0   0.938000    0.000000    1   11
Found:      3   4   5   0   8   0   0.139500    0.000000    1   11
Found:      4   4   5   1   12  0   0.494000    0.000000    -1  11
Found:      5   4   5   1   16  0   0.498000    0.000000    0   11
Found EOI----  Input File appears Fine.
==186907== Conditional jump or move depends on uninitialised value(s)
==186907==    at 0x41293C: rawthresh(particleMC_t*) (genr8.cc:954)
==186907==    by 0x413B48: setMass(particleMC_t*) (genr8.cc:886)
==186907==    by 0x410A29: main (genr8.cc:532)
==186907== 
==186907== Conditional jump or move depends on uninitialised value(s)
==186907==    at 0x413B2E: setMass(particleMC_t*) (genr8.cc:903)
==186907==    by 0x410A29: main (genr8.cc:532)
==186907== 

If I turn on X11 I get the same errors and the execution continues, ... and generates more errors:

Opening file test.txt for output. 
Using runNo: 50786
Using 100 events to determine the lorentz factor
Setting random number seed to: 1661325919
Reading:    beamp.x     beamp.y     beamp.z     beamMass
Found:      0.000000    0.000000    8.600000    0.000000 
Reading:    targetp.x   targetp.y   targetp.z   targetMass
Found:      0.000000    0.000000    0.000000    0.938720 
Reading: t-channelSlope
Found:  5.000000 
Reading: number of particles need to describe the decay
Found:  6 
Reading:    part#   chld1#  chld2#  prnt#   Id  nchld   mass        width       chrg    flag  
Found:      0   2   3   -1  0   2   1.232000    0.117000    2   0
Found:      1   4   5   -1  0   2   1.318000    0.107000    -1  0
Found:      2   4   5   0   14  0   0.938000    0.000000    1   11
Found:      3   4   5   0   8   0   0.139500    0.000000    1   11
Found:      4   4   5   1   12  0   0.494000    0.000000    -1  11
Found:      5   4   5   1   16  0   0.498000    0.000000    0   11
Found EOI----  Input File appears Fine.
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x41293C: rawthresh(particleMC_t*) (genr8.cc:954)
==124051==    by 0x413B48: setMass(particleMC_t*) (genr8.cc:886)
==124051==    by 0x410A29: main (genr8.cc:532)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x413B2E: setMass(particleMC_t*) (genr8.cc:903)
==124051==    by 0x410A29: main (genr8.cc:532)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x410AC8: main (genr8.cc:530)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x410AF0: main (genr8.cc:558)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x410BF4: main (genr8.cc:527)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x412158: CMmomentum(double, double, double) (genkin.cc:125)
==124051==    by 0x410C11: main (genr8.cc:607)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0x410C46: main (genr8.cc:608)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA10D: __ieee754_exp_avx (e_exp.c:70)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1D2: __ieee754_exp_avx (e_exp.c:91)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1E6: __ieee754_exp_avx (e_exp.c:91)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1EF: __ieee754_exp_avx (e_exp.c:92)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA200: __ieee754_exp_avx (e_exp.c:92)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA259: __ieee754_exp_avx (e_exp.c:97)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA25F: __ieee754_exp_avx (e_exp.c:97)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C08: exp (w_exp.c:27)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C12: exp (w_exp.c:27)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C19: exp (w_exp.c:27)
==124051==    by 0x410DF8: main (genr8.cc:633)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA10D: __ieee754_exp_avx (e_exp.c:70)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1D2: __ieee754_exp_avx (e_exp.c:91)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1E6: __ieee754_exp_avx (e_exp.c:91)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA1EF: __ieee754_exp_avx (e_exp.c:92)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Use of uninitialised value of size 8
==124051==    at 0xA8FA200: __ieee754_exp_avx (e_exp.c:92)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA259: __ieee754_exp_avx (e_exp.c:97)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8FA25F: __ieee754_exp_avx (e_exp.c:97)
==124051==    by 0xA8C4BE2: exp (w_exp.c:26)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C08: exp (w_exp.c:27)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C12: exp (w_exp.c:27)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C4C19: exp (w_exp.c:27)
==124051==    by 0x410E16: main (genr8.cc:634)
==124051== 
==124051== Conditional jump or move depends on uninitialised value(s)
==124051==    at 0xA8C5318: log (w_log.c:28)
==124051==    by 0x410E3B: main (genr8.cc:640)

I looked briefly at the code, and it seems these lines are dereferencing pointers to structures that hold the particle decay properties. I didn't have a chance to trace further.

It seems like this is a bug in genr8 that involves a read of uninitialized memory. In some cases that memory is suitably populated for the program to execute without the user noticing. (Hopefully the results are correct.) Evidently when X11 forwarding is off, then the uninitialized memory creates some problematic condition that results in an infinite loop.

I've kicked the can a bit further down the road, but it will take some additional work from an expert to dig just a bit more and figure out where the bug is.

rjones30 commented 2 years ago

Matt, can you share a link to the input ascii file for this run? -Richard J.

On Mon, Aug 22, 2022 at 5:22 PM Matthew Shepherd @.***> wrote:

To follow up a little... a student in our group got really hung up with this problem over the weekend and it burned a lot of time. I only had a very short time today to look at it, but I was able to reproduce the problem easily and I went one step further to run in valgrind.

When you run in valgrind there are some errors just prior to the hang that is consistent with what @sdobbs https://github.com/sdobbs notes above:

------- this is without X11 forwarding.. the execution hangs and I have to ctrl-c to recover

Opening file test.txt for output. Using runNo: 50786 Using 100 events to determine the lorentz factor Setting random number seed to: 1661389070 Reading: beamp.x beamp.y beamp.z beamMass Found: 0.000000 0.000000 8.600000 0.000000 Reading: targetp.x targetp.y targetp.z targetMass Found: 0.000000 0.000000 0.000000 0.938720 Reading: t-channelSlope Found: 5.000000 Reading: number of particles need to describe the decay Found: 6 Reading: part# chld1# chld2# prnt# Id nchld mass width chrg flag Found: 0 2 3 -1 0 2 1.232000 0.117000 2 0 Found: 1 4 5 -1 0 2 1.318000 0.107000 -1 0 Found: 2 4 5 0 14 0 0.938000 0.000000 1 11 Found: 3 4 5 0 8 0 0.139500 0.000000 1 11 Found: 4 4 5 1 12 0 0.494000 0.000000 -1 11 Found: 5 4 5 1 16 0 0.498000 0.000000 0 11 Found EOI---- Input File appears Fine. ==186907== Conditional jump or move depends on uninitialised value(s) ==186907== at 0x41293C: rawthresh(particleMC_t) (genr8.cc:954) ==186907== by 0x413B48: setMass(particleMC_t) (genr8.cc:886) ==186907== by 0x410A29: main (genr8.cc:532) ==186907== ==186907== Conditional jump or move depends on uninitialised value(s) ==186907== at 0x413B2E: setMass(particleMC_t*) (genr8.cc:903) ==186907== by 0x410A29: main (genr8.cc:532) ==186907==

If I turn on X11 I get the same errors and the execution continues, ... and generates more errors:

Opening file test.txt for output. Using runNo: 50786 Using 100 events to determine the lorentz factor Setting random number seed to: 1661325919 Reading: beamp.x beamp.y beamp.z beamMass Found: 0.000000 0.000000 8.600000 0.000000 Reading: targetp.x targetp.y targetp.z targetMass Found: 0.000000 0.000000 0.000000 0.938720 Reading: t-channelSlope Found: 5.000000 Reading: number of particles need to describe the decay Found: 6 Reading: part# chld1# chld2# prnt# Id nchld mass width chrg flag Found: 0 2 3 -1 0 2 1.232000 0.117000 2 0 Found: 1 4 5 -1 0 2 1.318000 0.107000 -1 0 Found: 2 4 5 0 14 0 0.938000 0.000000 1 11 Found: 3 4 5 0 8 0 0.139500 0.000000 1 11 Found: 4 4 5 1 12 0 0.494000 0.000000 -1 11 Found: 5 4 5 1 16 0 0.498000 0.000000 0 11 Found EOI---- Input File appears Fine. ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x41293C: rawthresh(particleMC_t) (genr8.cc:954) ==124051== by 0x413B48: setMass(particleMC_t) (genr8.cc:886) ==124051== by 0x410A29: main (genr8.cc:532) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x413B2E: setMass(particleMC_t*) (genr8.cc:903) ==124051== by 0x410A29: main (genr8.cc:532) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x410AC8: main (genr8.cc:530) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x410AF0: main (genr8.cc:558) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x410BF4: main (genr8.cc:527) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x412158: CMmomentum(double, double, double) (genkin.cc:125) ==124051== by 0x410C11: main (genr8.cc:607) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0x410C46: main (genr8.cc:608) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA10D: ieee754_exp_avx (e_exp.c:70) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1D2: ieee754_exp_avx (e_exp.c:91) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1E6: ieee754_exp_avx (e_exp.c:91) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1EF: ieee754_exp_avx (e_exp.c:92) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA200: ieee754_exp_avx (e_exp.c:92) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA259: ieee754_exp_avx (e_exp.c:97) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA25F: __ieee754_exp_avx (e_exp.c:97) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C08: exp (w_exp.c:27) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C12: exp (w_exp.c:27) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C19: exp (w_exp.c:27) ==124051== by 0x410DF8: main (genr8.cc:633) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA10D: ieee754_exp_avx (e_exp.c:70) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1D2: ieee754_exp_avx (e_exp.c:91) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1E6: ieee754_exp_avx (e_exp.c:91) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA1EF: ieee754_exp_avx (e_exp.c:92) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Use of uninitialised value of size 8 ==124051== at 0xA8FA200: ieee754_exp_avx (e_exp.c:92) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA259: ieee754_exp_avx (e_exp.c:97) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8FA25F: __ieee754_exp_avx (e_exp.c:97) ==124051== by 0xA8C4BE2: exp (w_exp.c:26) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C08: exp (w_exp.c:27) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C12: exp (w_exp.c:27) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C4C19: exp (w_exp.c:27) ==124051== by 0x410E16: main (genr8.cc:634) ==124051== ==124051== Conditional jump or move depends on uninitialised value(s) ==124051== at 0xA8C5318: log (w_log.c:28) ==124051== by 0x410E3B: main (genr8.cc:640)

I looked briefly at the code, and it seems these lines are dereferencing pointers to structures that hold the particle decay properties. I didn't have a chance to trace further.

It seems like this is a bug in genr8 that involves a read of uninitialized memory. In some cases that memory is suitably populated for the program to execute without the user noticing. (Hopefully the results are correct.) Evidently when X11 forwarding is off, then the uninitialized memory creates some problematic condition that results in an infinite loop.

I've kicked the can a bit further down the road, but it will take some additional work from an expert to dig just a bit more and figure out where the bug is.

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/halld_sim/issues/203#issuecomment-1223081151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YKWDSNGHYFYM6J7CTIZTV2PVRBANCNFSM5BRLM2SQ . You are receiving this because you commented.Message ID: @.***>

mashephe commented 2 years ago

To reproduce the above hang, I do the following on the ifarm

cd /work/halld/home/shepherd/scratch/
source /group/halld/Software/build_scripts/gluex_env_boot_jlab.csh
gxenv 
genr8 -l100 -Atest.txt -r50786 < a2minus.input

This will work on a ssh session that is initiated with the -Y option but results in a hang if -Y is omitted.

remitche66 commented 2 years ago

I ran into the same issue and tracked it down to uninitialized masses for particles coming from the lower vertex. I fixed it and submitted a pull request.

s6pepaul commented 2 years ago

Thank you very much for tracking this long-standing issue down and for submitting the fix! I merged the pull request (#265). I will close this issue.