EGA-archive / ega-download-client

A Python-based EGA download client
Apache License 2.0
94 stars 52 forks source link

'Remote end closed connection without response' and other issues #206

Open antonkratz opened 1 year ago

antonkratz commented 1 year ago

I am trying to download EGAD00001001991, ~3.5 terabyte in in size. The current error message that I have is 'Remote end closed connection without response'. I have been trying to download EGAD00001001991 over the past three weeks at different time points. No success. Other errors I encountered: failed md5 checksum which can not be resolved in an unsupervised fashion (pyega3 simply hangs), and memory issues. Is it even realistic to attempt a 3.5 TB download from EGA-Archive to Japan? Or how can I download EGAD00001001991 in an unsupervised fashion and with correct md5 checksums? Would Aspera help? I wrote to helpdesk multiple times but get the "untaken" message. With over 1500 files and 3.5 TB size, I cannot manually download individual files, hunt down files with broken md5 checksums, kill hanging processes... so I am seeking a method that works in an unsupervised fashion. Reaching out for some advice how to attempt such a large download.

(/home/kratz/miniconda3) [kratz@gc015 ~]$ time pyega3 -c 30 -cf /home/kratz/ega_credentials.json fetch EGAD00001001991 --output-dir ~/Lifelines-TEST/EGAD00001001991/ [2023-08-30 08:56:56 +0900] [2023-08-30 08:56:56 +0900] pyEGA3 - EGA python client version 5.0.2 (https://github.com/EGA-archive/ega-download-client) [2023-08-30 08:56:56 +0900] Parts of this software are derived from pyEGA (https://github.com/blachlylab/pyega) by James Blachly [2023-08-30 08:56:56 +0900] Python version : 3.10.11 [2023-08-30 08:56:56 +0900] OS version : Linux #1 SMP Sun Jul 26 15:27:06 UTC 2020 [2023-08-30 08:56:56 +0900] Server URL: https://ega.ebi.ac.uk:8443/v2 [2023-08-30 08:56:56 +0900] Session-Id: 4264875591 [2023-08-30 08:56:58 +0900] [2023-08-30 08:56:58 +0900] Authentication success for user 'kratz@sbi.jp' [2023-08-30 08:59:15 +0900] File Id: 'EGAF00001150354'(2813259467 bytes). [2023-08-30 08:59:15 +0900] Total space : 28517152.95 GiB [2023-08-30 08:59:15 +0900] Used space : 15745654.38 GiB [2023-08-30 08:59:15 +0900] Free space : 12482921.91 GiB [2023-08-30 08:59:15 +0900] Download starting [using 30 connection(s), file size 2813259451 and chunk length 104857600]... 0%| | 0.00/2.81G [00:00<?, ?B/s][2023-08-30 09:04:15 +0900] Retrying (Retry(total=19, connect=False, read=9, redirect=None, status=10)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /v2/files/EGAF00001150354?destinationFormat=plain [2023-08-30 09:04:15 +0900] Retrying (Retry(total=19, connect=False, read=9, redirect=None, status=10)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /v2/files/EGAF00001150354?destinationFormat=plain [2023-08-30 09:04:15 +0900] Retrying (Retry(total=19, connect=False, read=9, redirect=None, status=10)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /v2/files/EGAF00001150354?destinationFormat=plain

fangfyy commented 1 year ago

We are also having this problem, which I think is supposed to be a problem with the EGA server.

b-lac commented 1 year ago

It is not a new problem and I was able to download only when they gave an Aspera box to me, I think there is a kind of waiting queue for Aspera and you have to ask here and to the helpdesk...

harmjanwestra commented 1 year ago

Same here. It is also related to this issue: https://github.com/EGA-archive/ega-download-client/issues/192, which has been open since February.

I've also sent an e-mail to the helpdesk and hope they can give us aspera access. I'd advise you to also send an e-mail to the helpdesk, and mention the github issues, although I'm sure the dev team is aware of the problem.

antonkratz commented 1 year ago

Same here. It is also related to this issue: #192, which has been open since February.

I've also sent an e-mail to the helpdesk and hope they can give us aspera access. I'd advise you to also send an e-mail to the helpdesk, and mention the github issues, although I'm sure the dev team is aware of the problem.

So far, when I wrote to helpdesk, I got an email with the message: "This ticket has been untaken by ega-rt-support." without further explanation. I assume "untaken" means that they will not pursue this issue (?!). In any case, yes I would be happy if I could get an Aspera link or any response from the EGA that actually lets me download EGAD00001001991. I have already set up Aspera on the cluster here on my end in Japan. Thank you for the comments.

jaflo94 commented 1 year ago

Same problem here:

[2023-08-30 13:08:55 -0400] Authentication success for user 'jaflo'
[2023-08-30 13:12:04 -0400] File Id: 'EGAF00003943225'(62094815119 bytes).
[2023-08-30 13:12:04 -0400] Total space : 2764279.38 GiB
[2023-08-30 13:12:04 -0400] Used space : 2203066.00 GiB
[2023-08-30 13:12:04 -0400] Free space : 561213.38 GiB
[2023-08-30 13:12:04 -0400] Download starting [using 32 connection(s), file size 62094815103 and chunk length 104857600]...
  0%|                                                                                                                                                                    | 0.00/62.1G [00:00<?, ?B/s][2023-08-30 13:18:12 -0400] Retrying (Retry(total=17, connect=False, read=9, redirect=None, status=8)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:18:14 -0400] Retrying (Retry(total=17, connect=False, read=9, redirect=None, status=8)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:19:28 -0400] Retrying (Retry(total=15, connect=False, read=9, redirect=None, status=6)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:20:30 -0400] Retrying (Retry(total=16, connect=False, read=9, redirect=None, status=7)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:30:15 -0400] Retrying (Retry(total=17, connect=False, read=9, redirect=None, status=8)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:37:23 -0400] Retrying (Retry(total=18, connect=False, read=9, redirect=None, status=9)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:40:11 -0400] Retrying (Retry(total=16, connect=False, read=9, redirect=None, status=7)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:40:18 -0400] Retrying (Retry(total=16, connect=False, read=9, redirect=None, status=7)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain
[2023-08-30 13:42:24 -0400] Retrying (Retry(total=9, connect=False, read=9, redirect=None, status=0)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /v2/files/EGAF00003943225?destinationFormat=plain

Any suggestions? is this a EGA server problem? is this a temporary issue?

antonkratz commented 1 year ago

I feel if research data has been deposited into EGA-archive, there should be a practical and reasonable method of getting the data out of EGA-Archive.

thomasthtc commented 1 year ago

The server worked last week, but suddenly didn't work this week.

antonkratz commented 1 year ago

The server worked last week, but suddenly didn't work this week.

Might be, but on my end no download worked since 3 weeks ago. Fragments of files and some specific individual files, yes. But no way to download an entire microbiome data set from Lifelines DEEP.

LeonHafner commented 1 year ago

For me it worked on Monday this week. I didn't try on Tuesday, but since Wednesday (yesterday) I wasn't able to download a single byte. It shows the progress bar as in the first message from @antonkratz and after a few minutes times out and tells me that the connection is broken. Definitely a EGA server bug. Please fix this!

agunjur commented 1 year ago

I am also having the same issue. download hangs at 0.00 and then breaks. I'll watch this thread to see if it gets resolved...

fangfyy commented 1 year ago

The problem still seems to be unresolved and I still can't download any data.

laillern commented 1 year ago

Same problem as everybody else. I contacted the Help desk which created a ticket, to soon after mark it as "untaken". I guess we are on our own, with no data :(

jaflo94 commented 1 year ago

I found a different problem now, I am not even able to connect to the EGA server for the last 2 days:

`raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='ega.ebi.ac.uk', port=8443): Max retries exceeded with url: /ega-openid-connect-server/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe968f1df90>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

` I wonder if anyone else found similar problem or this is a problem from my side

BrendaLee1 commented 1 year ago

I found a different problem now, I am not even able to connect to the EGA server for the last 2 days:

`raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='ega.ebi.ac.uk', port=8443): Max retries exceeded with url: /ega-openid-connect-server/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe968f1df90>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

` I wonder if anyone else found similar problem or this is a problem from my side

Hi, I have the same issue. Error message says that the username or password is invalid and I wirte to EGA helpdesk, still waiting for their replay.

LeonHafner commented 1 year ago

Is there any movement in this problem? Maybe a statement from the EGA team?

aravind-ebi commented 1 year ago

Our team is aware of the issues impacting the download of files via the python client currently; we are actively investigating to determine its root cause and implement a resolution as quickly as possible. We apologise for any inconvenience it may be causing. We are committed to resolving this issue promptly and ensuring the continued stability and functionality of the client. We appreciate your patience and understanding during this time. We hope to have a resolution in place soon and will keep you updated on our progress.

In the meantime, please ensure that you are trying to download via the latest version of the python client (v5.0.2). If you have any questions or concerns, please feel free to reach out to our helpdesk team at ega-helpdesk@ebi.ac.uk and if you already have, we will get in touch with you.

SSSJe commented 1 year ago

Hi, I have the same error as you. how can I slove it? I didn't find the silmilar solution. HTTPSConnectionPool(host='ega.ebi.ac.uk', port=8443): Max retries exceeded with url: /v2/files/EGAF00002382373?destinationFormat=plain (Caused by ResponseError('too many 504 error responses'))

jaflo94 commented 1 year ago

It is a EGA server problem that EGA is trying to solve. Still no timeline when it would be solved.

CsabaHalmagyi commented 1 year ago

Dear @antonkratz Thank you for reporting the issue. I can confirm we had some performance and connection issues, that should be resolved now. Could you try again running the download client?

BrendaLee1 commented 1 year ago

Hi, A new error is reported, have anyone else seen similar error message before? [2023-09-25 16:46:45 +0800] ('Connection broken: IncompleteRead(5921036 bytes read, 98936564 more expected)', IncompleteRead(5921036 bytes read, 98936564 more expected)) Traceback (most recent call last): File "/rd1/laixh/soft/anaconda2/envs/pyega3/lib/python3.11/site-packages/urllib3/response.py", line 710, in _error_catcher yield File "/rd1/laixh/soft/anaconda2/envs/pyega3/lib/python3.11/site-packages/urllib3/response.py", line 835, in _raw_read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) urllib3.exceptions.IncompleteRead: IncompleteRead(5921036 bytes read, 98936564 more expected)

LeonHafner commented 1 year ago

I get the same error for some files. However, after failing pyega3 automatically restarts the download several times and after a while successfully finishes the download.

laillern commented 1 year ago

I have been able to download 100+ FASTQ files (between 2 and 16Gb in size) in a couple of days. It is still running. As previously mentioned, pyega3 quickly restarts and finishes the download, and it does so much faster than it was a few months back. Very happy the server is back on!!! Thanks!

jaflo94 commented 1 year ago

Yeah, there is no problems downloading files up to 65Gb; larger files throws a checksum error and it is not possible to download..

BrendaLee1 commented 1 year ago

I get the same error for some files. However, after failing pyega3 automatically restarts the download several times and after a while successfully finishes the download.

Hi, Thank you for your replay, I try to download bam file (~1T in size) and the download speed is slow ~2M/s. I find that pyega3 always quit after several restart, and the file is far from complete.

famosab commented 10 months ago

I have the same problem with version 5.1.0 when trying to download a file with around 150GB. Are there any news concerning this?

SSSJe commented 10 months ago

I have the same problem with version 5.1.0 when trying to download a file with around 150GB. Are there any news concerning this?

I think u may can send this messages in mail to the author. This is a problem about internet. He may be solve this problem.