Closed StevenCTimm closed 1 year ago
According to discussion at the operations meeting on 4/17 this has been fixed, and test jobs were successful.
Will leave this open until we start seeing JustIN jobs go through again.
The jobs are going through again but they are still having trouble writing to FNAL Dcache from ral tier1
Robert Illingworth requested us to make a SNOW ticket on this so Fermilab end is informed.
The error seems to have gone away on its own. JustIN can write back to Fermilab now
Will keep watching to make sure it doesn't recur in short order.
Have now opened Service Now ticket with Fermilab, RITM1716559
That is at Robert Illingworth's request. Failures are still intermittent.
It was thought that it might be uboone hogging all the network traffic but they've been blacklisted at this site since Apr 21 and the network errors still continue.
This behavior has stopped now. Don't know why.
jobsub admins have blocked any user jobs (including justIN) from running at RAL Tier1 due to ongoing asymmetric routing network problems. These are set to be fixed next week once Fermilab public dcache goes completely onto lhcone, scheduled for Apr 19. Won't file any tickets until we hear from Phil DeMar.