Closed rebeccachen0 closed 11 months ago
@RickKessler Pippin just runs submit_batch on the file produces in the 1_SIM directory, so I've no idea why Pippin running submit_batch will crash, but running submit_batch by itself won't, do you have any ideas?
The error in question:
23 [ ERROR | manager.py:492] Excerpt: Traceback (most recent call last):
22 [ ERROR | manager.py:492] Found error in file /scratch/midway2/rkessler/PIPPIN_OUTPUT/RC_DEBUG/1_SIM/DES_P21_HOSTEFF/PIP_RC_DEBUG_DES_P21_ HOSTEFF.LOG, excerpt below
21 [ ERROR | manager.py:492] Excerpt: FATAL ERROR ABORT :
20 [ ERROR | manager.py:492] Excerpt:
19 [ ERROR | manager.py:492] Excerpt:
18 [ ERROR | manager.py:492] Excerpt:
17 [ ERROR | manager.py:492] Excerpt: `|```````|`
16 [ ERROR | manager.py:492] Excerpt: <| o\ /o |>
15 [ ERROR | manager.py:492] Excerpt: | ' ; ' |
14 [ ERROR | manager.py:492] Excerpt: | ___ | ABORT submit on Fatal Error.
13 [ ERROR | manager.py:492] Excerpt: | |' '| |
12 [ ERROR | manager.py:492] Excerpt: | `---' |
11 [ ERROR | manager.py:492] Excerpt: \_______/
10 [ ERROR | manager.py:492] Found error in file /scratch/midway2/rkessler/PIPPIN_OUTPUT/RC_DEBUG/1_SIM/DES_P21_HOSTEFF/PIP_RC_DEBUG_DES_P21_ HOSTEFF.LOG, excerpt below
9 [ ERROR | manager.py:492] Excerpt: FATAL ERROR ABORT :
8 [ ERROR | manager.py:492] Excerpt: Unable to find NGENTOT_RATECALC: key in SIMnorm_PIP_RC_DEBUG_DES_P21_HOSTEFF_SNIaMODEL0.LOG ;
7 [ ERROR | manager.py:492] Excerpt: LOG created from sim normalization commands :
6 [ ERROR | manager.py:492] Excerpt: cd /scratch/midway2/rkessler/PIPPIN_OUTPUT/RC_DEBUG/1_SIM/DES_P21_HOSTEFF/LOGS ; \
5 [ ERROR | manager.py:492] Excerpt: snlc_sim.exe sn_ia_salt2_bs20_des5yr.input \
4 [ ERROR | manager.py:492] Excerpt: INIT_ONLY 1 DNDZ POWERLAW 2.27E-5 1.7 GENMAG_OFF_GLOBAL -0.12 GENMAG_SMEAR 1e-06 GE NMAG_SMEAR_MODELNAME C11 GENMAG_SMEAR_SCALE 0.0001 GENMAG_SMEAR_SCALE\(c\) 0,0 GENMODEL $DES_ROOT/SALT3training/OUT_TRAIN_SALT3_systCovar/SALT3. MODEL000+LAMEXTEND GENPDF_FILE $DES5YR/populations/FINAL_forDES5yr/DES5YR_S3P21_GENPDF.DAT GENPDF_OPTMASK 1 GENPEAK_SALT2ALPHA 0.145 HOSTLIB _MSKOPT 2 HOSTLIB_SCALE_PROPERTY_ERR 0.0\(LOGMASS\),0.0\(LOGSFR\),0.0\(LOGsSFR\),0.0\(COLOR\) HOSTLIB_WGTMAP_FILE $DES_USERS/mvincenzi/MYPIPPIN/sims _instrument/WGT_maps_DESX3/DES_WGTMAP_MassSFR_Wiseman2021.HOSTLIB OPT_MWCOLORLAW 89 OPT_MWEBV 3 PATH_USER_INPUT /scratch/midway2/rkessler/PIPP IN_OUTPUT/RC_DEBUG/1_SIM/DES_P21_HOSTEFF \
3 [ ERROR | manager.py:492] Excerpt: > SIMnorm_PIP_RC_DEBUG_DES_P21_HOSTEFF_SNIaMODEL0.LOG \
2 [ ERROR | manager.py:492] Excerpt: Crashed while preparing batch jobs.
1 [ ERROR | manager.py:492] Excerpt: Check Traceback
256 [ ERROR | manager.py:492] FAILED: SNANASimulation DES_P21_HOSTEFF task (wall time 0:00:02, 50 jobs, deps [])
1 [ DEBUG | config.py:200] Did not chown /scratch/midway2/rkessler/PIPPIN_OUTPUT/RC_DEBUG/RC_DEBUG.log
The SIMSED models have a KCOR_FILE mis-match: FATAL ERROR ABORT called by read_SIMSED_TABBINARY Binary file KCOR_FILE: '/project2/rkessler/SURVEYS/PS1MD/USERS/dscolnic/PANTHEON+/kcor/v6_1/kcor_DES_5yr_v6_1.fits' but current KCOR_FILE: '/project2/rkessler/PRODUCTS/SNDATA_ROOT/kcor/DES/DES-SN3YR/kcor_DECam.fits'
For SIMSED models, change your KCOR files to the private ones under /PANTHEON+. When cosmology paper goes into CWR we will release all these file in public locations to hopefully avoid these conflicts.
Adding KCOR_FILE: /project2/rkessler/SURVEYS/PS1MD/USERS/dscolnic/PANTHEON+/kcor/v6_1/kcor_DES_5yr_v6_1.fits
to the config doesn't seem to fix it -- still getting the same error
Looking in PIP_RC_DEBUG_DES_P21_HOSTEFF.LOG,
snlc_sim.exe: error while loading shared libraries: libCore.so: cannot open shared object file: No such file or directory
what machine did you log into ?
Hm. This is all on Midway2
@RickKessler @rebeccachen0 Has this progressed at all?
It seems like this got resolved with some recent SNANA update
Error reproduced at $PIPPIN_OUTPUT/RC_DEBUG -- I've run the same configuration previously with no error
Running submit_batch_jobs.sh on the .input file in the SIM directory submits and runs successfully, which is why I believe it's a Pippin-related issue?