churchmanlab / genewalk

GeneWalk identifies relevant gene functions for a biological context using network representation learning
https://churchman.med.harvard.edu/genewalk
BSD 2-Clause "Simplified" License
127 stars 14 forks source link

Error while downloading resources - PathwayCommons11.All.hgnc.sif.gz #16

Closed leiendeckerlu closed 4 years ago

leiendeckerlu commented 4 years ago

Hi there,

I was trying to get genewalk going on my data, however when running genewalk like this

genewalk --project test --genes ./input.csv --id_type hgnc_symbol --nproc 4

I'm presented with the following error message(s):

INFO: [2019-10-31 12:37:46] genewalk.cli - Creating project folder at /users/lule/genewalk/test
INFO: [2019-10-31 12:37:46] genewalk.resources - Using /users/lule/genewalk/resources as resource folder.
INFO: [2019-10-31 12:37:46] genewalk.resources - Downloading http://www.pathwaycommons.org/archives/PC2/v11/PathwayCommons11.All.hgnc.sif.gz and extracting into /users/lule/genewalk/resources/PathwayCommons11.All.hgnc.sif
Traceback (most recent call last):
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/http/client.py", line 936, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/socket.py", line 724, in create_connection
    raise err
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/socket.py", line 713, in create_connection
    sock.connect(sa)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/lule/.local/bin/genewalk", line 11, in <module>
    sys.exit(main())
  File "/users/lule/.local/lib/python3.6/site-packages/genewalk/cli.py", line 145, in main
    rm.download_all()
  File "/users/lule/.local/lib/python3.6/site-packages/genewalk/resources.py", line 53, in download_all
    self.get_pc()
  File "/users/lule/.local/lib/python3.6/site-packages/genewalk/resources.py", line 37, in get_pc
    download_gz(fname, url_pc)
  File "/users/lule/.local/lib/python3.6/site-packages/genewalk/resources.py", line 65, in download_gz
    urllib.request.urlretrieve(url, gz_file)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 1346, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/software/2020/software/python/3.6.6-foss-2018b/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 113] No route to host>

Is the PathwayCommons11.All.hgnc.sif.gz file no longer available under the URL?

Thanks, Lukas

bgyori commented 4 years ago

It seems like http://pathwaycommons.org is down currently. The download URL is not expected to have changed otherwise.

bgyori commented 4 years ago

I believe http://pathwaycommons.org is back online again, and the download URL works. Please let us know if you have any remaining issues.

leiendeckerlu commented 4 years ago

yep, that indeed did the trick. thanks for the quick help!

andrewbcaldwell commented 3 years ago

It looks like http://pathwaycommons.org is down again, so I am running into this issue. However, even when I set --network_source indra to specify an alternative database, it still tries to download the pathwaycommons database and won't run. Is there an alternative source for the pathwaycommons database that can be used?

ri23 commented 3 years ago

Hi @andrewbcaldwell If you upgrade genewalk to the latest version (v1.4.0) with pip install --upgrade genewalk it should automatically download PathwayCommons 12 instead of 11 using this link http://www.pathwaycommons.org/archives/PC2/v12/PathwayCommons12.All.hgnc.sif.gz This works for me. Can you please try the above and see if your error persists? For now we are limited to using Pathway Commons as a knowledge base, but looping in @bgyori to see when the indra database becomes fully available by calling --network_source indra.

bgyori commented 3 years ago

Hi @andrewbcaldwell, I just checked and the PathwayCommons resource download worked for me so it's possible that this was a temporary network issue. You can also manually download http://www.pathwaycommons.org/archives/PC2/v12/PathwayCommons12.All.hgnc.sif.gz and decompress it into ~/genewalk/resources/PathwayCommons12.All.hgnc.sif. GeneWalk will then not try to re-download it.

As for the --network_source indra option, that requires and additional parameter --network_file [statements.pkl] pointing to a pickle file containing INDRA Statements that were collected independent of GeneWalk (see https://indra.readthedocs.io/). In principle, the PathwayCommons resource file should not be accessed if --network_source is something other than pc, we can look into whether that happens inadvertently.

andrewbcaldwell commented 3 years ago

Thanks for the response and clarification regarding INDRA. After waiting for the PathwayCommons website to come back online, I was able to proceed with the analysis. I think I just happened to try to run the program for the first time when the PC site was down, and I didn't realize that INDRA required an additional parameter.