DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

JustIN submitting no new jobs to Global pool since July 8 04:10 #175

Closed StevenCTimm closed 4 months ago

StevenCTimm commented 4 months ago

The queues of Justin-prod-sched01 and Justin-prod-sched02 have had nothing submitted to them since 04:10 on July 8.

This despite several submitted JustIN workflows in the queue which ought to be submitting stuff.

StevenCTimm commented 4 months ago

I have submitted 1000 jobs as timm from the Fermilab side just to verify that the rest of the global pool is working ok but thus far it seems to be.

StevenCTimm commented 4 months ago

yes, frontend and factory perfectly functional, we can get glideins.. the problem is between JustIn and the condor_schedd

StevenCTimm commented 4 months ago

Andrew reports that it was a problem with munge on the schedd's which he has temporarily patched A number of ETF jobs have been submitted and ran AWT gradually filling in again.

StevenCTimm commented 4 months ago

This is understood and fixed (permissions on the munge secrets were wrong) closing.