[8.0] feat: Make RemoteRunner more resilient to CE issues

DIRACGrid / DIRAC

DIRAC Grid

http://diracgrid.org

GNU General Public License v3.0

113 stars 174 forks source link

[8.0] feat: Make RemoteRunner more resilient to CE issues #7606

Closed aldbr closed 4 months ago

aldbr commented 4 months ago

In LHCb, we are using a new HPC in "pre-production" and the connection is quite unstable for now. To avoid reporting a Done job as Failed just because we cannot get its status or its outputs, we retry to contact the CE a few times.

BEGINRELEASENOTES *WorkloadManagement CHANGE: Make RemoteRunner more resilient to CE issues ENDRELEASENOTES

DIRACGridBot commented 4 months ago

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/9029762875

Successful:

integration