EGA-archive / ega-download-client

A Python-based EGA download client
Apache License 2.0
94 stars 52 forks source link

Download stuck at 0% #203

Closed antonkratz closed 1 year ago

antonkratz commented 1 year ago

I start the client like this: pyega3 fetch EGAD00001001991

I enter my credentials and am stuck for several hours with: [2023-08-12 19:54:19 +0900] Download starting [using 1 connection(s), file size 2813259451 and chunk length 104857600]...

How can I proceed from here?

Guan06 commented 1 year ago

Hi,

I have the same issue when trying to fetch either a dataset of a single file...

Thank you and best, Rui

EW7721 commented 1 year ago

I have also had this same issue for weeks and have tried adjusting connections -c 1, 5, 10, 20, etc and chunk lengths -ms 1073741824 (default is -ms 104857600) as suggested by others but continue to have issues with the connection........

pyega3 -c 5 -ms 1073741824 -cf EGA_credentials_file.json fetch EGAF0000*******

Download starting [using 5 connection(s), file size 7643952250 and chunk length 1073741824]...
  0%|                                                                                                                                                                                                            | 0.00/7.64G [00:00<?, ?B/s]

[2023-08-16 12:30:39 -0400] Retrying (Retry(total=19, connect=False, read=9, redirect=None, status=10)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /v2/files/EGAF0000*******?destinationFormat=plain

thomasthtc commented 1 year ago

I have also had this same issue for weeks and have tried adjusting connections -c 1, 5, 10, 20, etc and chunk lengths -ms 1073741824 (default is -ms 104857600) as suggested by others but continue to have issues with the connection........

pyega3 -c 5 -ms 1073741824 -cf EGA_credentials_file.json fetch EGAF0000*******

Download starting [using 5 connection(s), file size 7643952250 and chunk length 1073741824]...
  0%|                                                                                                                                                                                                            | 0.00/7.64G [00:00<?, ?B/s]

[2023-08-16 12:30:39 -0400] Retrying (Retry(total=19, connect=False, read=9, redirect=None, status=10)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /v2/files/EGAF0000*******?destinationFormat=plain

Previously threads suggested a larger max slice, but I found that it doesn't work lately, because the connection will be interrupted so we need to start from scratch. So instead I tried a smaller chunk length today eg. -ms 50737418, and it seems to be working, though downloading very very slowly with occasion disruption for hours.

I've contacted the help desk, and they said it might take over a month to fix it. I requested to be on the waiting list for Aspera download but it will probably take some time until it is my turn. I must say this is not the best download experience especially I need the data to submit a manuscript pretty soon.

sahwa commented 1 year ago

Also having the same issue - don't suppose anyone has found a resolution?

python pyega3 -c 1 -ms 50737418 -cf credential_file.json fetch EGADxxx 

[2023-08-17 10:44:07 +0100]
[2023-08-17 10:44:07 +0100] pyEGA3 - EGA python client version 5.0.2 (https://github.com/EGA-archive/ega-download-client)
[2023-08-17 10:44:07 +0100] Parts of this software are derived from pyEGA (https://github.com/blachlylab/pyega) by James Blachly
[2023-08-17 10:44:07 +0100] Python version : 3.11.4
[2023-08-17 10:44:07 +0100] OS version : Linux #1 SMP Tue Jun 20 11:48:01 UTC 2023
[2023-08-17 10:44:07 +0100] Server URL: https://ega.ebi.ac.uk:8443/v2
[2023-08-17 10:44:07 +0100] Session-Id: 2130951956
[2023-08-17 10:44:08 +0100]
[2023-08-17 10:44:08 +0100] Authentication success for user 'x.x@x.x.x.x'
[2023-08-17 10:44:08 +0100] File Id: 'EGAFxxx'(30735778 bytes).
[2023-08-17 10:44:08 +0100] Total space : 26000.00 GiB
[2023-08-17 10:44:08 +0100] Used space : 25450.33 GiB
[2023-08-17 10:44:08 +0100] Free space : 549.67 GiB
[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
  0%|                                                                                                                                                                                                                                                                                            | 0.00/30.7M [00:00<?, ?B/s]

EDIT

This actually worked eventually, so it's worth trying a few times and waiting (my dataset was q small though)!

[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [03:47<00:00, 135kB/s]
[2023-08-17 10:47:56 +0100] Combining file chunks (this operation can take a long time depending on the file size)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 37.7GB/s]
[2023-08-17 10:47:56 +0100] Calculating md5 (this operation can take a long time depending on the file size)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 494MB/s]
[2023-08-17 10:47:56 +0100] Verifying file checksum
[2023-08-17 10:47:56 +0100] Saved to : '/well/ckb/users/aey472/EGAF00005858167/Kutanan_Liu_2021_Thai_Lao.tar.gz'(30735762 bytes, md5=6a78f1d316572c3c8f21ec73faa1036f)
[2023-08-17 10:47:56 +0100] Download complete
antonkratz commented 1 year ago

Also having the same issue - don't suppose anyone has found a resolution?

python pyega3 -c 1 -ms 50737418 -cf credential_file.json fetch EGADxxx 

[2023-08-17 10:44:07 +0100]
[2023-08-17 10:44:07 +0100] pyEGA3 - EGA python client version 5.0.2 (https://github.com/EGA-archive/ega-download-client)
[2023-08-17 10:44:07 +0100] Parts of this software are derived from pyEGA (https://github.com/blachlylab/pyega) by James Blachly
[2023-08-17 10:44:07 +0100] Python version : 3.11.4
[2023-08-17 10:44:07 +0100] OS version : Linux #1 SMP Tue Jun 20 11:48:01 UTC 2023
[2023-08-17 10:44:07 +0100] Server URL: https://ega.ebi.ac.uk:8443/v2
[2023-08-17 10:44:07 +0100] Session-Id: 2130951956
[2023-08-17 10:44:08 +0100]
[2023-08-17 10:44:08 +0100] Authentication success for user 'x.x@x.x.x.x'
[2023-08-17 10:44:08 +0100] File Id: 'EGAFxxx'(30735778 bytes).
[2023-08-17 10:44:08 +0100] Total space : 26000.00 GiB
[2023-08-17 10:44:08 +0100] Used space : 25450.33 GiB
[2023-08-17 10:44:08 +0100] Free space : 549.67 GiB
[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
  0%|                                                                                                                                                                                                                                                                                            | 0.00/30.7M [00:00<?, ?B/s]

EDIT

This actually worked eventually, so it's worth trying a few times and waiting (my dataset was q small though)!

[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [03:47<00:00, 135kB/s]
[2023-08-17 10:47:56 +0100] Combining file chunks (this operation can take a long time depending on the file size)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 37.7GB/s]
[2023-08-17 10:47:56 +0100] Calculating md5 (this operation can take a long time depending on the file size)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 494MB/s]
[2023-08-17 10:47:56 +0100] Verifying file checksum
[2023-08-17 10:47:56 +0100] Saved to : '/well/ckb/users/aey472/EGAF00005858167/Kutanan_Liu_2021_Thai_Lao.tar.gz'(30735762 bytes, md5=6a78f1d316572c3c8f21ec73faa1036f)
[2023-08-17 10:47:56 +0100] Download complete

Hey sahwa, by "eventually" you mean after several hours, or several days?

I am stuck in my research project, there is no way to move forward because I do not even have access to the data, I signed a non disclosure agreement etc, but after all I have no data.

sahwa commented 1 year ago

Also having the same issue - don't suppose anyone has found a resolution?

python pyega3 -c 1 -ms 50737418 -cf credential_file.json fetch EGADxxx 

[2023-08-17 10:44:07 +0100]
[2023-08-17 10:44:07 +0100] pyEGA3 - EGA python client version 5.0.2 (https://github.com/EGA-archive/ega-download-client)
[2023-08-17 10:44:07 +0100] Parts of this software are derived from pyEGA (https://github.com/blachlylab/pyega) by James Blachly
[2023-08-17 10:44:07 +0100] Python version : 3.11.4
[2023-08-17 10:44:07 +0100] OS version : Linux #1 SMP Tue Jun 20 11:48:01 UTC 2023
[2023-08-17 10:44:07 +0100] Server URL: https://ega.ebi.ac.uk:8443/v2
[2023-08-17 10:44:07 +0100] Session-Id: 2130951956
[2023-08-17 10:44:08 +0100]
[2023-08-17 10:44:08 +0100] Authentication success for user 'x.x@x.x.x.x'
[2023-08-17 10:44:08 +0100] File Id: 'EGAFxxx'(30735778 bytes).
[2023-08-17 10:44:08 +0100] Total space : 26000.00 GiB
[2023-08-17 10:44:08 +0100] Used space : 25450.33 GiB
[2023-08-17 10:44:08 +0100] Free space : 549.67 GiB
[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
  0%|                                                                                                                                                                                                                                                                                            | 0.00/30.7M [00:00<?, ?B/s]

EDIT This actually worked eventually, so it's worth trying a few times and waiting (my dataset was q small though)!

[2023-08-17 10:44:08 +0100] Download starting [using 1 connection(s), file size 30735762 and chunk length 50737418]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [03:47<00:00, 135kB/s]
[2023-08-17 10:47:56 +0100] Combining file chunks (this operation can take a long time depending on the file size)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 37.7GB/s]
[2023-08-17 10:47:56 +0100] Calculating md5 (this operation can take a long time depending on the file size)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7M/30.7M [00:00<00:00, 494MB/s]
[2023-08-17 10:47:56 +0100] Verifying file checksum
[2023-08-17 10:47:56 +0100] Saved to : '/well/ckb/users/aey472/EGAF00005858167/Kutanan_Liu_2021_Thai_Lao.tar.gz'(30735762 bytes, md5=6a78f1d316572c3c8f21ec73faa1036f)
[2023-08-17 10:47:56 +0100] Download complete

Hey sahwa, by "eventually" you mean after several hours, or several days?

I am stuck in my research project, there is no way to move forward because I do not even have access to the data, I signed a non disclosure agreement etc, but after all I have no data.

It took me about an hour or two of just cancelling the comand and resubmitting and eventually worked. All done within a day - but the files were pretty tiny ~30Mb or so, so I don't know how long it would take for larger files.

thomasthtc commented 1 year ago

Surprisingly I am able to start downloading the datasets in recent days. I found that a smaller -ms actually works. It still gets disconnected every now and then, but at least it restarts in a certain slice. Here is my code:

pyega3 -cf /your_config_file.json -c 20 -ms 100000000 -d fetch EGADxxxxxxxx --output-dir /your_output_directory -M 1000

Guan06 commented 1 year ago

I tried same script today and it works now!

antonkratz commented 1 year ago

I tried same script today and it works now!

Thank you so much for the heads up Guan06, pyega3 fetch EGAD00001001991 started on my machine as well! I will report if it manages to download the entire thing.

CsabaHalmagyi commented 1 year ago

Dear commenters, Thank you for submitting your error logs and reporting the issue. I can confirm that we had a connection problem that made downloads very slow/not possible. This issue has been fixed by the dev team recently. I am closing this issue now but should you encounter any errors in the future, please contact our helpdesk team at helpdesk@ega-archive.org. Regards, Csaba

antonkratz commented 1 year ago

Hi @CsabaHalmagyi , thank you very much for looking into this. I can still not download successfully however, I tried pyega3 -c 30 -cf /home/kratz/ega_credentials.json fetch EGAD00001006959 --output-dir ~/Lifelines-TEST/EGAD00001006959/ but I got [2023-08-24 22:39:38 +0900] Download process expected md5 value 'ceb13b8005cbbb7ad50fcd6184d3f300' but got '7973a8bbaa7930409f0216c4233a220e' followed by Python error messages and then pyEGA3 crashes. I assume this is an unrelated error, but so far I have no success downloading a very large archive (EGAD00001001991, over 3 terabyte, way over 1500 files), Lifelines DEEP data. Would Aspera help? I have already written to helpdesk@ega-archive.org.

JSegueni commented 1 year ago

Hi all, Last week, when trying to download a dataset, I was having the same "download stuck at 0%" issue. Now, whenever I try to download a file (around 30Gb), the download does start and reach 100% but the md5 checksum always fail, as it does for @antonkratz. I also sometimes get slice errors during the downloads. I had contacted the helpdesk about the 0% issue and would like an Aspera download link if possible to solve the situation.