DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Problems with job submission from CERN etc #26

Closed Andrew-McNab-UK closed 2 years ago

Andrew-McNab-UK commented 2 years ago

It does not seem to be possible to reliably use jobsub_submit from CERN and other sites outside FNAL. If jobsub01.fnal.gov or jobsub03.fnal.gov are chosen as the server it seems to be ok but usually jobsub02.fnal.gov is used which results in

pycurl.error: (35, 'OpenSSL SSL_connect: Connection reset by peer in connection to jobsub02.fnal.gov:8443 ')

HTTP response:0 PyCurl Error (35, 'OpenSSL SSL_connect: Connection reset by peer in connection to jobsub02.fnal.gov:8443 ')

which is usually a firewall problem. Using the --jobsub-server option to select jobsub01 doesn't seem to make a difference.

In passing, there also seems to be a firewall problem for a URL that cigetcert fetches although it doesn't prevent the few successful job submissions you get to the other two jobsub servers:

cigetcert: fetch of options from https://fifebatch.fnal.gov/cigetcertopts.txt failed: URLError:

I've attached a log of setting up the environment on lxplus and then trying to submit a job jobsub.log.txt

Andrew-McNab-UK commented 2 years ago

This seems to be working ok now although the Service Now ticket has not actually been updated.

Andrew-McNab-UK commented 2 years ago

Service Now ticket has now been updated to say it is working and they can see requests in the logs.