IGS / portal_client

Python-based client for downloading data made available through portals powered by the GDC-based portal system..
MIT License
16 stars 17 forks source link

Error: No valid URL found in the manifest file #2

Closed mariel2017 closed 5 years ago

mariel2017 commented 5 years ago

I am using the portal_client function with as input a manifest file. However, for each URL in the manifest file it gives the following error:

Not all files (total of 5) were downloaded successfully. Number of failures: 5 -- no valid URL in the manifest file

I tried different manifest files and I get the same error. Could you please help?

victor73 commented 5 years ago

Can you please post your exact command and a line or two from the manifest file you have?

mariel2017 commented 5 years ago

Command: portal_client --manifest hmp_cart_11c7f44dd.tsv

Manifest file:

file_id md5 size    urls    sample_id
1419f08f554e0c93f3b62fe90c092ab8    23eb8207442bd452f0d77ecc0fdba7f9    24137   fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmscp/CSM5MCXD_taxonomic_profile.biom 1419f08f554e0c93f3b62fe90c08f0f2
1419f08f554e0c93f3b62fe90c0913c6    7306d49f9271980be75ebad5d9c4cc9d    30012348    fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmmrc/CSM5MCXD_genefamilies_relab.biom    1419f08f554e0c93f3b62fe90c08f0f2
victor73 commented 5 years ago

Since the manifest seems to have only fasp:// (Aspera) urls, please try adding a "--endpoint-priority=FASP" to your command. Also, to get a little more debugging information, try adding "--debug" as well. A future iteration will do a better job of giving the end user feedback when the default endpoints are not sufficient for the URLs in the manifest...

MrDanyBrown commented 5 years ago

Since the manifest seems to have only fasp:// (Aspera) urls, please try adding a "--endpoint-priority=FASP" to your command. Also, to get a little more debugging information, try adding "--debug" as well. A future iteration will do a better job of giving the end user feedback when the default endpoints are not sufficient for the URLs in the manifest...

Hello victor , I try your comment and that is work for aspera_manifest.tsv(you give we) but I dowloaded diffrent tsv and that say No such file or directory.Difrent tsv :

This image is working your aspera_manifest.tsv work

file_id md5 size    urls    sample_id
1419f08f554e0c93f3b62fe90c0db7cf    1aafd7a4515c22d131351a05b9528d3d    15322124    fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmmrc/CSM79HJY_genefamilies_relab.biom    1419f08f554e0c93f3b62fe90c0d9dfd
1419f08f554e0c93f3b62fe90c0db842    eefaffea20601799c1eb28708093dc17    596905  fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmmrc/CSM79HJY_ecs_relab.biom 1419f08f554e0c93f3b62fe90c0d9dfd
1419f08f554e0c93f3b62fe90c0dccc1    cd0637cf3284946e583b04a2a48f58d3    16474   fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmscp/CSM79HJY_taxonomic_profile.biom 1419f08f554e0c93f3b62fe90c0d9dfd
1419f08f554e0c93f3b62fe90c0dc1c7    feee2bb8eaf7932cbbe5de750b770387    83427   fasp://aspera2.ihmpdcc.org/ibd/genome/microbiome/wgs/analysis/hmmrc/CSM79HJY_pathabundance_relab.biom   1419f08f554e0c93f3b62fe90c0d9dfd

This is error image error

victor73 commented 5 years ago

After I copy and pasted the above lines into a new manifest file (and replaced the spaces with tab characters), I ran the portal_client on it and was able to download all 4 files successfully on a Linux machine. Do you get the above error consistently? Perhaps it has something to do with execution on Windows...

victor73 commented 5 years ago

Actually, I think I see the problem. In manifest_processor.py:

file_name = "{0}/{1}".format(destination, url_list[0].split('/')[-1])

This is unlikely to work in Windows because of the '/' path separator... So, even though this was never advertised to work on Windows, I think we should be able to fix it shortly.

mariel2017 commented 5 years ago

Dear Victor,

I am getting the same error as described by DanyBrown but I am working on a mac.

The error message is:

2019-02-27 17:58:32,004 - manifest_processor.ManifestProcessor - DEBUG - In checksum_matches. Checking ./CSM5MCXD.tar.partial.
Traceback (most recent call last):
  File "./portal_client.py", line 238, in <module>
    main()
  File "./portal_client.py", line 202, in main
    args.endpoint_priority
  File "/portal_client-master/lib/manifest_processor.py", line 221, in download_manifest
    if self._checksum_matches(tmp_file_name, mfile['md5']):
  File "/portal_client-master/lib/manifest_processor.py", line 274, in _checksum_matches
    with open(file_path, 'rb') as filehandle:
FileNotFoundError: [Errno 2] No such file or directory: './CSM5MCXD.tar.partial'

MB

Thank you for looking into this.

victor73 commented 5 years ago

Okay, can you both please try the latest version v1.3.0 and report your results?

mariel2017 commented 5 years ago

Dear Victor,

Thank you for the update.

The url in the downloaded manifest file will mostly start with fasp://aspera2.ihmpdcc.org/.. while it should be fasp://aspera.ihmpdcc.org/ for me in order to download the files successfully.

The following error message is popping up but still it is going further to download the files.

./portal_client --manifest example_manifests/hmp_cart_61b9b087a.tsv --endpoint-priority=FASP --user=mariel --debug
Enter your password (Don't worry, it's not shown): 
2019-03-05 13:45:34,894 - root - DEBUG - Creating ManifestProcessor.
2019-03-05 13:45:34,910 - boto - DEBUG - Retrieving credentials from metadata server.
2019-03-05 13:45:34,931 - boto - ERROR - Caught exception reading instance data
Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/anaconda3/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/anaconda3/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/anaconda3/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/anaconda3/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/anaconda3/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/anaconda3/lib/python3.7/http/client.py", line 928, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/anaconda3/lib/python3.7/socket.py", line 727, in create_connection
    raise err
  File "/anaconda3/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/boto/utils.py", line 217, in retry_url
    r = opener.open(req, timeout=timeout)
  File "/anaconda3/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/anaconda3/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/anaconda3/lib/python3.7/urllib/request.py", line 1345, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/anaconda3/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 61] Connection refused>
2019-03-05 13:45:34,937 - boto - ERROR - Unable to read instance data, giving up
victor73 commented 5 years ago

Those boto related errors are inconsequential and are the result of the code attempting to detect whether it's running on Amazon AWS or not. What's more important are any aspera related messages. Do you see anything like that in your output when your run with --debug?