SBNSoftware / sbnci

Packages, modules and scripts for the SBN continuous integration system
2 stars 4 forks source link

check worker nodes for good SBN CVMFS repo #33

Closed chilge closed 2 years ago

chilge commented 2 years ago

some nodes cannot access the SBN CVMFS repository causing jobs to fail.

vitodb commented 2 years ago

The issue is actually SBN StashCache, whose CVMFS instance is not available on some worker nodes. I'm following on this with Distributing Computing Support.

vitodb commented 2 years ago

Hmm my memory is wobbling, sorry for the confusion. We have ClassAd to filter slots based on SBN StashCache CVMFS, but not to filter slots based on SBN CVMFS area.

chilge commented 2 years ago

Full production of the SBND input samples is going much better this time around with a few % failure rate compared to a week ago, a 50% failure rate.

The jobs that failed in the most recent run have error messages like this

Setting up LArSoft from "CVMFS":
 - executing '/cvmfs/larsoft.opensciencegrid.org/products/setup'
 - appending '/cvmfs/fermilab.opensciencegrid.org/products/common/db'
Setting up artdaq from "CVMFS":
 - appending '/cvmfs/fermilab.opensciencegrid.org/products/artdaq'
Error (code: 1) setting up SBN UPS area.

It could just be a coincidence, but I don't believe I have seen this error for the ICARUS jobs before. Could this be chalked up to the difference in the storage locations for the experiment specific flux files (StashCache for SBND vs. dCache persistent for ICARUS)?

vitodb commented 2 years ago

@chilge the specific error message you posted above is about SBN CVMFS area, not the SBN StashCache area. They are both accessible through CVMFS path, but they are different. SBN CVMFS is available through /cvms/sbn.opensciencegrid.org SBN StashCache is available through /cvmfs/sbn.osgstorage.org Similar paths are used for SBND and ICARUS. Currently all those paths, but /cvms/sbn.opensciencegrid.org (i.e. SBN CVMFS) can be checked through a job requirement. HTC team supposedly enabled this possibility yesterday, but it looks like it is taking a bit to get slots with this check enabled, currently I see only FermiGrid slots with this enabled.

chilge commented 2 years ago

@vitodb thanks for the clarification. this is good news! we should be able to close this issue soon then.

vitodb commented 2 years ago

ClassAds to check for SBN CVMFS availability and the associated revision have been deployed, so for CI Validation jobs, and eventually also for analysis and production jobs, we can use those ClassAds in the requirement expression. I just added those requirements in the CI workflow for ICARUS/SBND CI Validation jobs