DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Wayne State stuff running but not calling back #54

Closed StevenCTimm closed 4 months ago

StevenCTimm commented 1 year ago

Have filed https://support.opensciencegrid.org/support/tickets/public/c748ef70d532c31e1dd2cba3a5079e87496911d9c9ea862c968f78f9d93db063

StevenCTimm commented 1 year ago

According to Andrew: The WSU entries in the OSG config with the DUNE VO are marked as enabled=”FALSE”. It also has spaces in its WLCG name(!)

So we should follow up on this. It is already out of JustIN but from my investigation on the factory jobs are still running against it.

StevenCTimm commented 1 year ago

Note that the link above is not actually the Wayne state public link

https://support.opensciencegrid.org/support/tickets/public/f40e5c1086411f21464bda65e1aa388b2c7491c513b211189189b38e67111b55

StevenCTimm commented 1 year ago

So it turns out there are, apparently, 2 different and disjoint versions of the OSG config. The newer one comes from the yml files under OSG_autoconf and that one is still active. Have added this question to the already-opened ticket.

StevenCTimm commented 1 year ago

So it appears that at the moment DUNE has nothing either pending or held. JustIN has 24 jobs in the queue for site GRID_Ce2 That's not enough to get any glideins sent yet though. Wait until it builds up a bit and then we should see some new submissions.

StevenCTimm commented 1 year ago

I have now dumped 100 jobs into the queue plus there were 40 JustIN jobs queued up we will see what happens now.

StevenCTimm commented 1 year ago

our frontend clearly now requesting idle glideins of gfactory-2.opensciencegrid.org but no obvious ones are being sent yet.

StevenCTimm commented 1 year ago

Monitoring shows that some jobs have matched and run, and my jobs on jobsub02 have exited the queue and completed. All the Justin jobs are still there, they haven't matched. Not quite sure why yet, updated the ticket and asked OSG to leave it running. We will have to catch a running glidein at WSU and do condor_q -ana, not an easy task.

StevenCTimm commented 4 months ago

This is covered in #98 closing this as duplicate.