JeffersonLab / gluex_MCwrapper

1 stars 4 forks source link

MCWrapper fails in batch mode #117

Closed amschertz closed 7 months ago

amschertz commented 7 months ago

I'm getting an issue where I can generate 100 events on the command line and it'll work fine, but if I try to generate MC using the same configuration in batch mode, I get a SLURM_FAILED error. From the logfiles it looks like the variable ana_pre is undefined, but I don't know if that's enough to cause the job to fail, or what I should change to fix it. I'd appreciate any help. For reference, my MCWrapper config file is located at /work/halld2/home/aschertz/myanalysis/batch/simulation/MC.config, and examples of the output files are at /volatile/halld/home/aschertz/simulation/ver06/omegapi_deltaPlusPlus_phasespace_2017_01_ver06/log/30496_std*

lihaoahil commented 7 months ago

It seems like a little bug that at line 1742 of MakeMC.csh, the variable $ana_pre is always examined , while it is defined only when "$CUSTOM_ANA_PLUGINS" != "None". It seems a bit strange that usually line 1742 would find itself satisfied before even having to examine ana_pre and the script doesn't complain, but when it happened without a defined $CUSTOM_ANA_PLUGINS as in Amy's case, the variableana_pre is called but found not defined.

amschertz commented 7 months ago

Actually, maybe ana_pre isn't critical. I see that I get the same warning on the command line but it'll generate the MC without failing

nsjarvis commented 7 months ago

I'm also getting failures in batch mode but mine are different. I used git checkout bf1475025e666fc1134960913c654798d4536efb because I thought this would give me Justin's endif fix, but maybe there's an internal path that was circumvented. My jobs are all failing immediately after running gen-amp, with else: endif not found.

jrstevenjlab commented 7 months ago

@amschertz, it looks like you've defined CUSTOM_PLUGINS=file:/work/halld2/home/aschertz/myanalysis/batch/simulation/jana_5pi.conf in your MC.config file, but you may need to use define CUSTOM_ANA_PLUGINS instead to specify what JANA configuration to run for the analysis part of the job (making trees) instead of the reconstruction piece (making REST files)

@nsjarvis, that was the symptom I saw before I fixed it with this PR https://github.com/JeffersonLab/gluex_MCwrapper/pull/112. Have you tried just using the master? I think all the updates you need are there.

nsjarvis commented 7 months ago

@jrstevenjlab I tried again, but now there is a new problem, cannot contact xrootd.

ls: cannot access root://sci-xrootd.jlab.org//osgpool/halld//random_triggers/rcdb/run030784_random.hddm: No such file or directory
ls: cannot access root://nod25.phys.uconn.edu/Gluex/rawdata//random_triggers/rcdb/run030784_random.hddm: Connection refused

followed by some nasty looking messages from rcdb eg in /work/halld/njarvis/makeMC/phi_v68_newmc/output/log/30784_stderr.30784_2.err

s6pepaul commented 7 months ago

@nsjarvis Hi Naomi, I think the issue here has nothing to do with xrootd. It might test a few connections and find it does not have them, but the actual issue is that it seems like you use an old MakeMC.(c)sh file from before the addition of gen_amp_V2 together with an gluex_MC.py from after the addition. Am I right to think, that you use /work/halld/njarvis/makeMC/phi_v68_newmc/MC_2k.config for the submission? There you set up your environment with MCWrapper 2.8.0. But you probably submit with your own gluex_MC.py from the master branch? The easiest fix would probably to set the CUSTOM_MAKEMC to the MakeMC file from the master branch as well.

s6pepaul commented 7 months ago

@amschertz Hi Amy, is your problem resolved? I was able to run your MC.config file in tcsh (what you used for your job) without errors, but I also noticed that it already contained the line Justin suggested.

I did get an error when running in bash though. There is a whitespace missing, I will add it shortly.

amschertz commented 7 months ago

@s6pepaul Hi Peter, yeah, it seems to be. I added the line that Justin mentioned and my MC seems to have generated without issues. Thanks!

nsjarvis commented 7 months ago

@s6pepaul Hi Peter, I didn't reference 2.8.0 anywhere. I think the problem is that I am using the gluex_MC.py from my downloaded master, and that is internally calling $MCWRAPPER_CENTRAL which is set to 2.8.0 by my gx version set. Maybe that is what you meant. I did wonder when I first tried this, if all the paths would be switched just by changing the py location: nope.
What if I just change $MCWRAPPER_CENTRAL to where I put my github clone?

s6pepaul commented 7 months ago

Yes, that is what I meant. The version set that is setting up your environment on the batch node is using 2.8.0. So you could either set CUSTOM_MAKEMC to your local GitHub clone of the master branch, or change your version set to point to that version of MCWrapper.

nsjarvis commented 7 months ago

it's running nicely now, thanks.