Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

Cached inputs are incongruent with what STRING transformer expects -> FileNotFoundError #473

Open caufieldjh opened 11 months ago

caufieldjh commented 11 months ago

In #472 , I bumped the STRING input data to v11.5 and changed all references to it...or at least I thought so. Here's one I missed:

10:11:43  Traceback (most recent call last):
10:11:43    File "run.py", line 202, in <module>
10:11:43      cli()
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
10:11:43      return self.main(*args, **kwargs)
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
10:11:43      rv = self.invoke(ctx)
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
10:11:43      return _process_result(sub_ctx.command.invoke(sub_ctx))
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
10:11:43      return ctx.invoke(self.callback, **ctx.params)
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
10:11:43      return __callback(*args, **kwargs)
10:11:43    File "run.py", line 74, in transform
10:11:43      kg_transform(*args, **kwargs)
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/transform.py", line 66, in transform
10:11:43      t.run()
10:11:43    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/transform_utils/string_ppi/string_ppi.py", line 171, in run
10:11:43      ) as edge, gzip.open(data_file, "rt") as interactions:
10:11:43    File "/usr/lib/python3.8/gzip.py", line 58, in open
10:11:43      binary_file = GzipFile(filename, gz_mode, compresslevel)
10:11:43    File "/usr/lib/python3.8/gzip.py", line 173, in __init__
10:11:43      fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
10:11:43  FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/9606.protein.links.full.v11.5.txt.gz'
caufieldjh commented 11 months ago

Ah, I hadn't actually missed this one. The build is failing on attempting to download ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz, so it reverts to using the s3 cache, which contains older versions of everything, including the STRING data (with the previous filename and everything). So there's a FileNotFound since the transform is looking for the newer version. That's probably an argument for a static filename (like "stringppi.txt.gz" or something) but if I had done that I may not have noticed that the new data wasn't being used.