DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Liverpool hepgrid5 running but not calling back #56

Closed StevenCTimm closed 1 year ago

StevenCTimm commented 1 year ago

we see glideiins continually running at Liverpool but no slots in the pool calling back.

Filed https://support.opensciencegrid.org/support/tickets/72694 We need to see the glideiin logs

There was one spike of slots hit over the weekend (Apr 11) but nothing since then.. not sure what is happening And most of the Justin jobs didn't match to those slots but some did.

StevenCTimm commented 1 year ago

Liverpool started working on Apr 17, we don't know why. we never saw the job logs yet.

StevenCTimm commented 1 year ago

Edita from the factory informs us that we didn't have a vmem setting in the pilot config and jobs were getting killed for that reason. This has now been fixed.