desihub / fiberassign

Fiber assignment code for DESI
BSD 3-Clause "New" or "Revised" License
7 stars 8 forks source link

fiberassign crashes valgrind. #35

Closed rainwoodman closed 6 years ago

rainwoodman commented 8 years ago

The main error message is

--15909:0: aspacem Valgrind: FATAL: VG_N_SEGMENTS is too low.
--15909:0: aspacem   Increase it and rebuild.  Exiting now.

The full log is here:

valgrind ../src/fiberassign params_fiberassign.txt 
==15909== Memcheck, a memory error detector
==15909== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15909== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==15909== Command: ../src/fiberassign params_fiberassign.txt
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x433873: printFile(char const*) (misc.cpp:999)
==15909==    by 0x40380C: main (fiberassign.cpp:32)
==15909== 
Targfile mtl-lite.fits 
SStarsfile stdstars-lite.fits 
SkyFfile  sky-lite.fits 
Secretfile truth-lite.fits 
surveyFile default_survey_list.txt 
tileFile 0.3.1/data/footprint/desi-tiles.par 
fibFile 0.3.1/data/focalplane/fiberpos.txt 
outDir /home/yfeng1/source/fiberassign/test/output/ 

PrintAscii true 
PrintFits false 
diagnose true 

kind QSOLy-a QSOTracer LRG ELG FakeQSO FakeLRG SS SF 
type QSO QSO LRG ELG QSO LRG SS SF 
prio 3400 3400 3200 3000 3400 3200 0 0 
priopost 3500 0 3200 0 0 0 0 0 
goal 5 5 2 1 5 2 5 5 
goalpost 5 1 2 1 1 1 5 5 
lastpass 0 0 0 1 0 0 1 1  
SS       0 0 0 0 0 0 1 0 
SF       0 0 0 0 0 0 0 1  
pass_intervals 0 50 100 150 200 

Randomize false 
Pacman false 
Npass 5 
MaxSS 10 
MaxSF 40 
PlateRadius 1.65 
InterPlate 0 
Analysis 0 
InfDens false 

TotalArea 15789.0 
invFibArea 700 
moduloGal 1 
moduloFiber 1 

Collision false 
Exact true 
AvCollide 3.2 
Collide 1.98 
NoCollide 7.0 
PatrolRad 5.8 
NeighborRad 14.05 

PlotObsTime false 
PlotHistLya false 
PlotDistLya false 
PlotFreeFibHist false 
PlotFreeFibTime false 
PlotSeenDens false 
PrintGalObs false 

MinDec -10. 
MaxDec 10. 
MinRa 0. 
MaxRa 10. 
Verif false 
------------------------------------------------- 
# read target, SS, SF files at 0.141 s
reading MTL file mtl-lite.fits
HDU #2  Binary Table:
Keeping 593965 targets within ra/dec ranges
reading MTL file stdstars-lite.fits
HDU #2  Binary Table:
NUMOBS_MORE not found ... setting to 0
PRIORITY not found ... setting to 0
GRAYLAYER not found ... setting to 0
Keeping 27715 targets within ra/dec ranges
reading MTL file sky-lite.fits
HDU #2  Binary Table:
NUMOBS_MORE not found ... setting to 0
PRIORITY not found ... setting to 0
GRAYLAYER not found ... setting to 0
Keeping 277634 targets within ra/dec ranges
# ... took : 14.9 s
 Target size 593965 
 Standard Star size 621680 
 Sky Fiber size 899314 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x403B89: main (fiberassign.cpp:50)
==15909== 
==15909== Use of uninitialised value of size 8
==15909==    at 0x403BA7: main (fiberassign.cpp:51)
==15909== 
==15909== Use of uninitialised value of size 8
==15909==    at 0x403BC6: main (fiberassign.cpp:51)
==15909== 
==15909== Use of uninitialised value of size 8
==15909==    at 0x403BF5: main (fiberassign.cpp:52)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x403C3E: main (fiberassign.cpp:55)
==15909== 
==15909== Use of uninitialised value of size 8
==15909==    at 0x403C58: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF20CB: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Use of uninitialised value of size 8
==15909==    at 0x5AEE0CB: _itoa_word (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF2610: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AEE0D5: _itoa_word (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF2610: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF268E: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF21A1: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF2741: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF21F3: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
==15909== Conditional jump or move depends on uninitialised value(s)
==15909==    at 0x5AF222A: vfprintf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so)
==15909==    by 0x403C6D: main (fiberassign.cpp:56)
==15909== 
  class  0  number  474347
  class  1  number  68576
  class  2  number  51042
# ... took : 21.3 s
getting file list
 number of tiles 10666 
==15909== Warning: set address range perms: large range [0x395db040, 0x747e6840) (undefined)
 size of P  10666
--15909:0: aspacem Valgrind: FATAL: VG_N_SEGMENTS is too low.
--15909:0: aspacem   Increase it and rebuild.  Exiting now.
rncahn commented 8 years ago

Yes, that's what I find, too. But do recall that I did establish that the code gives consistent answers on cori and edison, which is what I set out to investigate.

On Tue, Mar 29, 2016 at 12:59 PM, Yu Feng notifications@github.com wrote:

The main error message is

--15909:0: aspacem Valgrind: FATAL: VG_N_SEGMENTS is too low. --15909:0: aspacem Increase it and rebuild. Exiting now.

The full log is here:

valgrind ../src/fiberassign params_fiberassign.txt ==15909== Memcheck, a memory error detector ==15909== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15909== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==15909== Command: ../src/fiberassign params_fiberassign.txt ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x433873: printFile(char const*) (misc.cpp:999) ==15909== by 0x40380C: main (fiberassign.cpp:32) ==15909== Targfile mtl-lite.fits SStarsfile stdstars-lite.fits SkyFfile sky-lite.fits Secretfile truth-lite.fits surveyFile default_survey_list.txt tileFile 0.3.1/data/footprint/desi-tiles.par fibFile 0.3.1/data/focalplane/fiberpos.txt outDir /home/yfeng1/source/fiberassign/test/output/

PrintAscii true PrintFits false diagnose true

kind QSOLy-a QSOTracer LRG ELG FakeQSO FakeLRG SS SF type QSO QSO LRG ELG QSO LRG SS SF prio 3400 3400 3200 3000 3400 3200 0 0 priopost 3500 0 3200 0 0 0 0 0 goal 5 5 2 1 5 2 5 5 goalpost 5 1 2 1 1 1 5 5 lastpass 0 0 0 1 0 0 1 1 SS 0 0 0 0 0 0 1 0 SF 0 0 0 0 0 0 0 1 pass_intervals 0 50 100 150 200

Randomize false Pacman false Npass 5 MaxSS 10 MaxSF 40 PlateRadius 1.65 InterPlate 0 Analysis 0 InfDens false

TotalArea 15789.0 invFibArea 700 moduloGal 1 moduloFiber 1

Collision false Exact true AvCollide 3.2 Collide 1.98 NoCollide 7.0 PatrolRad 5.8 NeighborRad 14.05

PlotObsTime false PlotHistLya false PlotDistLya false PlotFreeFibHist false PlotFreeFibTime false PlotSeenDens false PrintGalObs false

MinDec -10. MaxDec 10. MinRa 0. MaxRa 10.

Verif false

read target, SS, SF files at 0.141 s

reading MTL file mtl-lite.fits HDU #2 Binary Table: Keeping 593965 targets within ra/dec ranges reading MTL file stdstars-lite.fits HDU #2 Binary Table: NUMOBS_MORE not found ... setting to 0 PRIORITY not found ... setting to 0 GRAYLAYER not found ... setting to 0 Keeping 27715 targets within ra/dec ranges reading MTL file sky-lite.fits HDU #2 Binary Table: NUMOBS_MORE not found ... setting to 0 PRIORITY not found ... setting to 0 GRAYLAYER not found ... setting to 0 Keeping 277634 targets within ra/dec ranges

... took : 14.9 s

Target size 593965 Standard Star size 621680 Sky Fiber size 899314 ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x403B89: main (fiberassign.cpp:50) ==15909== ==15909== Use of uninitialised value of size 8 ==15909== at 0x403BA7: main (fiberassign.cpp:51) ==15909== ==15909== Use of uninitialised value of size 8 ==15909== at 0x403BC6: main (fiberassign.cpp:51) ==15909== ==15909== Use of uninitialised value of size 8 ==15909== at 0x403BF5: main (fiberassign.cpp:52) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x403C3E: main (fiberassign.cpp:55) ==15909== ==15909== Use of uninitialised value of size 8 ==15909== at 0x403C58: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF20CB: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Use of uninitialised value of size 8 ==15909== at 0x5AEE0CB: _itoa_word (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF2610: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AEE0D5: _itoa_word (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF2610: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF268E: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF21A1: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF2741: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF21F3: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== ==15909== Conditional jump or move depends on uninitialised value(s) ==15909== at 0x5AF222A: vfprintf (in /usr/lib64/libc-2.22.so) ==15909== by 0x5AF8D28: printf (in /usr/lib64/libc-2.22.so) ==15909== by 0x403C6D: main (fiberassign.cpp:56) ==15909== class 0 number 474347 class 1 number 68576 class 2 number 51042

... took : 21.3 s

getting file list number of tiles 10666 ==15909== Warning: set address range perms: large range [0x395db040, 0x747e6840) (undefined) size of P 10666 --15909:0: aspacem Valgrind: FATAL: VG_N_SEGMENTS is too low. --15909:0: aspacem Increase it and rebuild. Exiting now.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/desihub/fiberassign/issues/35

rainwoodman commented 8 years ago

Yes. I think this (valgrind crashing) is a separate issue from #31 .

It may take a huge rewrite to get through valgrind cleanly, and the effort is likely not worth it: after all, most (if not all) memory access is protected by std:vector and looked pretty safe.

But I do think we shall leave a record about this incompatibility with valgrind on the bug tracker.

tskisner commented 6 years ago

Is the script in test/test_fiberassign.py still the main "functional test"? Or is there some better test case to use?

rncahn commented 6 years ago

My recollection is that a long time ago, though fibereassign ran fine, there were problems found with valgrind, of the sort "missing constructor." I thought this had been fixed, but what is needed is to run valgrind on in again.

On Fri, May 25, 2018 at 6:03 PM, Theodore Kisner notifications@github.com wrote:

Is the script in test/test_fiberassign.py still the main "functional test"? Or is there some better test case to use?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/desihub/fiberassign/issues/35#issuecomment-392136335, or mute the thread https://github.com/notifications/unsubscribe-auth/AIeo3cGTeD3CFpy22toQtrdyoDL2WYVtks5t2Ed9gaJpZM4H7FCC .

sbailey commented 6 years ago

test/test_fiberassign.py is probably broken after the interface change to use command line arguments instead of a config file. Example command from the minitest notebook that can be used for testing with valgrind:

basedir=/project/projectdirs/desi/datachallenge/reference_runs/18.3
fiberassign \
    --mtl $basedir/targets/mtl.fits \
    --stdstar $basedir/targets/standards-dark.fits \
    --sky $basedir/targets/sky.fits \
    --surveytiles $basedir/fiberassign/dark-tiles.txt \
    --footprint $basedir/targets/test-tiles.fits \
    --positioners $DESIMODEL/data/focalplane/fiberpos.txt \
    --fibstatusfile $basedir/fiberassign/fiberstatus.ecsv \
    --outdir $SCRATCH/temp
tskisner commented 6 years ago

I cannot reproduce this on edison. Steps to verify:

  1. Load your favorite desiconda environment

  2. Go into your fiberassign checkout, master branch, and install to (for example) someplace in scratch:

    $> PLATFORM=harpconfig INSTALL_DIR=$SCRATCH/software/fiberassign make clean
    $> PLATFORM=harpconfig INSTALL_DIR=$SCRATCH/software/fiberassign make install

    Note that I always build using the harpconfig platform file, which allows for using the same compile options as HARP (installed in desiconda) and SPECEX (which also uses harpconfig). This builds with the Intel compilers at NERSC- the same ones used to build the compiled packages in desiconda.

  3. Make sure that this fiberassign is first in your path:

    export PATH=$SCRATCH/software/fiberassign/bin:$PATH
  4. Load the Intel-compatible version of valgrind, and run it.

    $> module load valgrind
    $> basedir=/project/projectdirs/desi/datachallenge/reference_runs/18.3 \
    valgrind --leak-check=full --track-origins=yes fiberassign \
    --mtl $basedir/targets/mtl.fits \
    --stdstar $basedir/targets/standards-dark.fits \
    --sky $basedir/targets/sky.fits \
    --surveytiles $basedir/fiberassign/dark-tiles.txt \
    --footprint $basedir/targets/test-tiles.fits \
    --positioners $DESIMODEL/data/focalplane/fiberpos.txt \
    --fibstatusfile $basedir/fiberassign/fiberstatus.ecsv \
    --outdir ./out

    Output is

    ==6505== Memcheck, a memory error detector                                                      
    ==6505== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.                        
    ==6505== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info                     
    ==6505== Command: /scratch2/scratchdirs/kisner/software/fiberassign/bin/fiberassign --mtl /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/mtl.fits --stdstar /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/standards-dark.fits --sky /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/sky.fits --surveytiles /project/projectdirs/desi/datachallenge/reference_runs/18.3/fiberassign/dark-tiles.txt --footprint /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits --positioners /global/common/software/desi/users/kisner/edison/20180130-1.2.4-spec/desimodel/0.9.1/data/focalplane/fiberpos.txt --fibstatusfile /project/projectdirs/desi/datachallenge/reference_runs/18.3/fiberassign/fiberstatus.ecsv --outdir ./out                                                                                
    ==6505==                                                                                                     
    fiberassign_exec --mtl /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/mtl.fits  --sky /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/sky.fits --stdstar /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/standards-dark.fits  --fibstatusfile /project/projectdirs/desi/datachallenge/reference_runs/18.3/fiberassign/fiberstatus.ecsv              --outdir ./out             --surveytiles /project/projectdirs/desi/datachallenge/reference_runs/18.3/fiberassign/dark-tiles.txt              --footprint /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits              --positioners /global/common/software/desi/users/kisner/edison/20180130-1.2.4-spec/desimodel/0.9.1/data/focalplane/fiberpos.txt             --starmask 60129542144             --rundate 2018-06-12                          
    # Read target, SS, SF files at 4.7e-05 s                                                                     
    star mask 60129542144                                                                                        
    Finding file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/standards-dark.fits        
    Found MTL input file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/standards-dark.fits
    Reading MTL input file /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/standards-dark.fits                                                                                                            
    NUMOBS_MORE not found ... setting to 0                                                                       
    PRIORITY not found ... setting to 0                                                                          
    Keeping 1217 targets within ra/dec ranges                                                                    
    star mask 0                                                                                                  
    Finding file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/sky.fits                   
    Found MTL input file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/sky.fits           
    Reading MTL input file /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/sky.fits          
    NUMOBS_MORE not found ... setting to 0                                                                       
    PRIORITY not found ... setting to 0                                                                          
    Keeping 48128 targets within ra/dec ranges                                                                   
    star mask 0                                                                                                  
    Finding file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/mtl.fits                   
    Found MTL input file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/mtl.fits           
    Reading MTL input file /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/mtl.fits          
    Keeping 240902 targets within ra/dec ranges                                                                  
    # ...read targets  took : 0.473 s                                                                            
    Target size 240902                                                                                          
    Standard Star size 242119                                                                                   
    Sky Fiber size 290247                                                                                       
    # map position in target list to immutable targetid at 0.633 s                                               
    # assign priority classes at 0.633 s                                                                         
    class  0  number  1221                                                                                     
    class  1  number  10452                                                                                    
    class  2  number  43629                                                                                    
    class  3  number  625                                                                                      
    class  4  number  42659                                                                                    
    class  5  number  37339                                                                                    
    class  6  number  83651                                                                                    
    class  7  number  15443                                                                                    
    class  8  number  7100                                                                                     
    # ...priority list took : 0.0108 s                                                                           
    # Start positioners at 0.644 s                                                                               
    before reading positioners                                                                                   
    read the positioner file                                                                                     
    sorted by fiber number                                                                                      
    i 0 FibPos[i].fib_num 0                                                                                     
    i 1 FibPos[i].fib_num 1                                                                                     
    i 2 FibPos[i].fib_num 2                                                                                     
    i 3 FibPos[i].fib_num 3                                                                                     
    i 4 FibPos[i].fib_num 4                                                                                     
    i 5 FibPos[i].fib_num 5                                                                                     
    i 6 FibPos[i].fib_num 6                                                                                     
    i 7 FibPos[i].fib_num 7                                                                                     
    i 8 FibPos[i].fib_num 8                                                                                     
    i 9 FibPos[i].fib_num 9                                                                                     
    made neighbors                                                                                              
    Input TimeSun Jun 12 00:00:00 2018                                                                           
    Current TimeSun Jun 12 00:00:00 2018                                                                         
    before reading status                                                                                        
    Read from fiber status: Fiber_pos 0 Location 95 Broken 1 Stuck 0 dates 2018-02-21T09:23:51 2100-02-21T09:24:24                                                                                                            
    Init Time for FiberSun Feb 21 09:23:51 2018                                                                  
    End Time for FiberSun Feb 21 09:24:24 2100                                                                   
    Changing fiberastatus entry: Fiber 0 Location 95                                                             
    BROKEN                                                                                                       
    Read from fiber status: Fiber_pos 1 Location 62 Broken 1 Stuck 0 dates 2018-02-21T09:23:51 2100-02-21T09:24:24                                                                                                            
    Init Time for FiberSun Feb 21 09:23:51 2018                                                                  
    End Time for FiberSun Feb 21 09:24:24 2100                                                                   
    Changing fiberastatus entry: Fiber 1 Location 62                                                             
    BROKEN                                                                                                       
    Read from fiber status: Fiber_pos 2 Location 102 Broken 0 Stuck 1 dates 2018-02-21T09:23:51 2100-02-21T09:24:24                                                                                                           
    Init Time for FiberSun Feb 21 09:23:51 2018                                                                  
    End Time for FiberSun Feb 21 09:24:24 2100                                                                   
    Changing fiberastatus entry: Fiber 2 Location 102                                                            
    STUCK                                                                                                        
    Read from fiber status: Fiber_pos 3 Location 82 Broken 0 Stuck 1 dates 2018-02-21T09:23:51 2100-02-21T09:24:24                                                                                                            
    Init Time for FiberSun Feb 21 09:23:51 2018                                                                  
    End Time for FiberSun Feb 21 09:24:24 2100                                                                   
    Changing fiberastatus entry: Fiber 3 Location 82                                                             
    STUCK                                                                                                        
    Read from fiber status: Fiber_pos 4 Location 131 Broken 0 Stuck 1 dates 2018-02-21T09:23:51 2100-02-21T09:24:24                                                                                                           
    Init Time for FiberSun Feb 21 09:23:51 2018                                                                  
    End Time for FiberSun Feb 21 09:24:24 2100                                                                   
    Changing fiberastatus entry: Fiber 4 Location 131                                                            
    STUCK                                                                                                        
    read status file                                                                                             
    # ..posiioners  took : 0.208 s                                                                               
    # Start plates at 0.853 s                                                                                    
    number of tiles 7                                                                                           
    Finding file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits            
    Found input tile centers file: /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits                                                                                                        
    Reading input tile centers file /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits                                                                                                       
    size of P  10                                                                                               
    # Start invert tiles at 0.0147 s                                                                             
    # ..inversion  took : 2.31e-05 s                                                                             
    # do inversion of used plates at 0.0147 s                                                                    
    # .. sued plates inversion  took : 0.00207 s                                                                 
    # Read 7 plate centers from /project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/test-tiles.fits and 5000 fibers from /global/common/software/desi/users/kisner/edison/20180130-1.2.4-spec/desimodel/0.9.1/data/focalplane/fiberpos.txt                                                                                 
    # ..plates   took : 0.0172 s                                                                                 
    # Start building HTM tree at 0.87 s                                                                          
    # Doing kd-tree... took : 0.0747 s                                                                           
    # collect galaxies at  at 0.945 s                                                                            
    # Begin collecting available galaxies                                                                        
    # ... took : 0.195 s                                                                                        
    # ... took : 0.196 s                                                                                         
    # collect available tile-fibers at at 1.14 s                                                                 
    # Begin computing available tilefibers                                                                       
    # ... took : 0.0629 s                                                                                        
    galaxies outside footprint 20299                                                                             
    Nplate 7  Ngal 290247   Nfiber 5000                                                                         
    # Start assignment at :  1.22 s                                                                              
    # Begin simple assignment :                                                                                  
    # ... took : 0.45 s                                                                                          
    countme 35000                                                                                               
    Plates actually used 7                                                                                      
    start redistribute                                                                                           
    # Begin redistribute TF :                                                                                    
    46 redistributions of tile-fibers                                                                          
    # ... took : 0.00648 s
    # Begin improve :
    improvements  48
    # ... took : 0.00188 s
    start redistribute
    # Begin redistribute TF :
    4 redistributions of tile-fibers
    # ... took : 0.00375 s
    # assign SS and SF  at 1.68 s
    # count SS and SF  at 1.82 s
    Totals SS      0    SF   2800 class  0       0 class  1       0 class  2      12 class  3       0 class  4     586 class  5       0 class  6   17806 class  7    7935 class  8    5826
    # print fits files  at 1.82 s
    # Finished !... in 1.91 s
tskisner commented 6 years ago

Looking at the platform files that are labeled "nersc_*", they seem to be using GNU compilers, but linking to cfitsio from desiconda (built with Intel). In principle those should be ABI compatible, but... Probably safer to use the harpconfig platform and get the same compilers and options used for compiled code everywhere else in the desi stack.

tskisner commented 6 years ago

Ah, interesting- it looks like the final executable is now "fiberassign_exec" rather than "fiberassign". My tests above were with an older version of the executable. Ignore my previous results.

tskisner commented 6 years ago

Ok, some more information. Using intel-compiled fiberassign with the intel-compiled libcfitsio from desiconda causes valgrind to die with an unhandled instruction error. This is due to sse4 instructions in the Intel math library which are linked in with "-lm" when building cfitsio. I built my own cfitsio (and valgrind) on edison with gcc-7.1, and then built fiberassign with the same gcc and ran it. This produces fairly clean output: there is one place to dig deeper to double check that memory is initialized and then there are several places we need to check to ensure memory is being freed.

The conclusion here is: don't use valgrind with Intel-compiled code. Fortunately we can test this with valgrind using gcc, and could also run in vtune if we needed to check the Intel built version.

Here is the valgrind output: fiberassign_valgrind.log

I'll leave this ticket open until I investigate those areas flagged in the output.