Open tiborsimko opened 3 years ago
Example of "native" retry tools:
Haven't looked at what pycurl/xrootd may have built-in. We could always improve our own for _retry in range
loop not to check only fro error = ( DOWNLOAD_ERROR_PAGE[...
, but turn it into a more complete function catching other exceptional situations.
Example: I just launch two commands quickly after each other, and I saw:
$ wget http://localhost/eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam.zip
--2020-11-19 11:56:39-- http://localhost/eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam.zip
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2020-11-19 11:56:40 ERROR 403: Forbidden.
$ $ wget http://localhost/eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam.zip
--2020-11-19 11:56:44-- http://localhost/eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam.zip
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1594567234 (1.5G) [application/zip]
Saving to: ‘GamGam.zip’
The 403 is a rather frequent error by the underlying reverse proxy that we also worked around in #91 via size and checksum.
After we introduce exit status based retry, we could probably remove that check-error-page-size-and-checksum hack.
Example: 206 Partial Content situation occurred several times while downloading one big file via wget
:
$ wget http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
--2020-06-24 14:53:19-- http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Resolving opendata.cern.ch (opendata.cern.ch)... 188.184.93.89, 188.185.82.144, 188.184.28.138
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4011567937 (3.7G) [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 26%[=========================================> ] 1.01G 7.82MB/s in 87s
2020-06-24 14:54:46 (11.9 MB/s) - Connection closed at byte 1081769706. Retrying.
--2020-06-24 14:54:47-- (try: 2) http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4011567937 (3.7G), 2929798231 (2.7G) remaining [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 29%[++++++++++++++++++++++++++++++++++++++++++===> ] 1.09G 11.5MB/s in 6.7s
2020-06-24 14:54:54 (12.6 MB/s) - Connection closed at byte 1169634135. Retrying.
--2020-06-24 14:54:56-- (try: 3) http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4011567937 (3.7G), 2841933802 (2.6G) remaining [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 56%[++++++++++++++++++++++++++++++++++++++++++++++=========================================> ] 2.09G 15.1MB/s in 88s
2020-06-24 14:56:24 (11.7 MB/s) - Connection closed at byte 2248454684. Retrying.
--2020-06-24 14:56:27-- (try: 4) http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4011567937 (3.7G), 1763113253 (1.6G) remaining [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 82%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++==========================================> ] 3.10G 13.1MB/s in 82s
2020-06-24 14:57:49 (12.6 MB/s) - Connection closed at byte 3328930017. Retrying.
--2020-06-24 14:57:53-- (try: 5) http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4011567937 (3.7G), 682637920 (651M) remaining [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 85%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++===> ] 3.20G 6.88MB/s in 10s
2020-06-24 14:58:04 (9.83 MB/s) - Connection closed at byte 3433695865. Retrying.
--2020-06-24 14:58:09-- (try: 6) http://opendata.cern.ch/eos/opendata/cms/MonteCarlo2011/Summer11LegDR/BuToKstarMuMuV2_EtaPtFilter_7TeV-pythia6-evtgen/AODSIM/PU_S13_START53_LV6-v1/00000/000043B3-0E0B-E411-B2FE-0026189438E9.root
Connecting to opendata.cern.ch (opendata.cern.ch)|188.184.93.89|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4011567937 (3.7G), 577872072 (551M) remaining [application/octet-stream]
Saving to: ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’
000043B3-0E0B-E411-B2FE-0026189438E9.root.1 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++======================>] 3.74G 14.6MB/s in 36s
2020-06-24 14:58:44 (15.5 MB/s) - ‘000043B3-0E0B-E411-B2FE-0026189438E9.root.1’ saved [4011567937/4011567937]
Musing after researching on the Issue:
There are two things we need to address
Every library we use(pycurl
/requests
/xrootdpyfs
) has their own way of handling these two things.
pycurl
pycurl.error
and getinfo
for different HTTP CodesWRITEDATA
requests
xrootdpyfs
xrootdpyfs
xrootd
python client methods directly.Please let me know if I can break those steps as well!
Now that we are catching downloading errors via
--retry-limit
and--retry-sleep
parameters thanks to #91, we can build upon this basis and enrich the error handling logic to catch more errors (such as Error 500 received from server, etc).Example how to reproduce:
Current behaviour of the client is to output Python traceback:
Expected behaviour is to:
docker-compose up