Closed chilge closed 2 years ago
The issue is actually SBN StashCache, whose CVMFS instance is not available on some worker nodes. I'm following on this with Distributing Computing Support.
Hmm my memory is wobbling, sorry for the confusion. We have ClassAd to filter slots based on SBN StashCache CVMFS, but not to filter slots based on SBN CVMFS area.
Full production of the SBND input samples is going much better this time around with a few % failure rate compared to a week ago, a 50% failure rate.
The jobs that failed in the most recent run have error messages like this
Setting up LArSoft from "CVMFS":
- executing '/cvmfs/larsoft.opensciencegrid.org/products/setup'
- appending '/cvmfs/fermilab.opensciencegrid.org/products/common/db'
Setting up artdaq from "CVMFS":
- appending '/cvmfs/fermilab.opensciencegrid.org/products/artdaq'
Error (code: 1) setting up SBN UPS area.
It could just be a coincidence, but I don't believe I have seen this error for the ICARUS jobs before. Could this be chalked up to the difference in the storage locations for the experiment specific flux files (StashCache for SBND vs. dCache persistent for ICARUS)?
@chilge the specific error message you posted above is about SBN CVMFS area, not the SBN StashCache area. They are both accessible through CVMFS path, but they are different. SBN CVMFS is available through /cvms/sbn.opensciencegrid.org SBN StashCache is available through /cvmfs/sbn.osgstorage.org Similar paths are used for SBND and ICARUS. Currently all those paths, but /cvms/sbn.opensciencegrid.org (i.e. SBN CVMFS) can be checked through a job requirement. HTC team supposedly enabled this possibility yesterday, but it looks like it is taking a bit to get slots with this check enabled, currently I see only FermiGrid slots with this enabled.
@vitodb thanks for the clarification. this is good news! we should be able to close this issue soon then.
ClassAds to check for SBN CVMFS availability and the associated revision have been deployed, so for CI Validation jobs, and eventually also for analysis and production jobs, we can use those ClassAds in the requirement expression. I just added those requirements in the CI workflow for ICARUS/SBND CI Validation jobs
some nodes cannot access the SBN CVMFS repository causing jobs to fail.