facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

400 Bad Request for foldseekdb #518

Open mjoh223 opened 1 year ago

mjoh223 commented 1 year ago

I cannot download the high quality crust 30 foldseek database file: https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca

I'm getting a 400 bad request. All other files in this directory can be downloaded.

lelandbr commented 1 year ago

I got the same error today. I'm using foldseek databases to download this, which calls aria2c to do the download. All the files in https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/ seem to be downloaded fine except for highquality_clust30_ca (the largest file). I called aria2c outside of the foldseek scripts to download just this one file alone, and I got the same error:

aria2c --log=tmp_aria_dl_log.txt https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca

The log file shows that the connection was made, but that the response from the server was HTTP/1.1 400 Bad Request:

2023-04-04 13:28:15.628084 [DEBUG] [SocketCore.cc:994] Securely connected to dl.fbaipublicfiles.com (108.156.201.129:443) with TLSv1.2
2023-04-04 13:28:15.628132 [INFO] [HttpConnection.cc:129] CUID#7 - Requesting:
GET /esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca HTTP/1.1
User-Agent: aria2/1.36.0
Accept: */*,application/metalink4+xml,application/metalink+xml
Host: dl.fbaipublicfiles.com
Want-Digest: SHA-512;q=1, SHA-256;q=1, SHA;q=0.1

2023-04-04 13:28:15.824645 [DEBUG] [AbstractCommand.cc:184] CUID#7 - socket: read:1, write:0, hup:0, err:0
2023-04-04 13:28:15.824813 [INFO] [HttpConnection.cc:164] CUID#7 - Response received:
HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 513
Connection: keep-alive
Server: CloudFront
Date: Tue, 04 Apr 2023 19:28:15 GMT
Expires: Tue, 04 Apr 2023 19:28:15 GMT
X-Cache: Error from cloudfront
Via: 1.1 d7484ceebc64d9cbda81209193465bc8.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: DEN52-P1
X-Amz-Cf-Id: WsBff4LcsbhqNlCig76yK8V0VxgZ-TYptidSFdC8Hkta4uzNRhoZtA==

2023-04-04 13:28:15.824956 [DEBUG] [AbstractCommand.cc:184] CUID#7 - socket: read:0, write:0, hup:0, err:0
2023-04-04 13:28:15.825009 [INFO] [DownloadEngine.cc:315] Pool socket for 108.156.201.129(443)
2023-04-04 13:28:15.825132 [ERROR] [AbstractCommand.cc:351] CUID#7 - Download aborted. URI=https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca
Exception: [AbstractCommand.cc:351] errorCode=22 URI=https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca
  -> [HttpSkipResponseCommand.cc:240] errorCode=22 The response status is not successful. status=400
2023-04-04 13:28:15.825211 [DEBUG] [AbstractCommand.cc:479] CUID#7 - Aborting download
2023-04-04 13:28:15.825231 [DEBUG] [AbstractCommand.cc:426] CUID#7 - Not trying next request. No reserved/pooled request is remaining and total length is still unknown.
2023-04-04 13:28:15.825264 [DEBUG] [RequestGroup.cc:983] GID#29611a9a93c20962 - Request queue check
2023-04-04 13:28:15.825294 [NOTICE] [RequestGroupMan.cc:427] Download GID#29611a9a93c20962 not complete: 
2023-04-04 13:28:15.825340 [DEBUG] [RequestGroup.cc:1173] GID#29611a9a93c20962 - Creating DownloadResult.
2023-04-04 13:28:15.825393 [DEBUG] [RequestGroupMan.cc:482] 1 RequestGroup(s) deleted.

Similar download attempts to smaller files in the same dir are successful:

aria2c https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_h

I was even able to successfully download highquality_clust30_ss which is 7.4 GB

I'm using: aria2 version 1.36.0 I see the same failure to download the large file when I just try to download these directly from these links in Firefox.

w3ntinglu commented 1 year ago

Hi @mjoh223 and @lelandbr, thanks for reporting the issue with detailed error log. We just addressed this issue, would you please try again?

curl -I  https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca
HTTP/2 200
lelandbr commented 1 year ago

This was successful for me - thanks!!

I downloaded the file https://dl.fbaipublicfiles.com/esmatlas/v0/highquality_clust30/foldseekdb/highquality_clust30_ca by using the foldseek databases command in foldseek, which calls aria2c via foldseek/data/structdatabases.sh

It also seems like the link works for download directly in Firefox, although I didn't test completely downloading it via other methods.

Thanks again for the help!