TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Unable to build chemical compendium on missing input file #41

Closed brettasmi closed 1 year ago

brettasmi commented 2 years ago

I tried to build the chemical compendium using the documentation in the README, but it failed as follows:

Babel % snakemake --cores 1 chemical
Building DAG of jobs...
MissingInputException in line 86 of /Users/bsmith/isb/Babel/src/snakefiles/chemical.snakefile:
Missing input files for rule chemical_drugbank_ids:
    output: /Users/bsmith/isb/Babel/babel_downloads/chemicals/ids/DRUGBANK
    affected files:
        /Users/bsmith/isb/Babel/babel_downloads/DrugBank/UC_XREF.srcfiltered.txt

I assume that this may simply be a case where the documentation is out of date, as per #32.

If there are updated instructions, even if it's just a few commands specific to the chemicals, that can be shared here ahead of an update to the README, I'd really appreciate seeing them.

Please let me know if there is anything I can clarify or contribute here. Thanks!

gaurav commented 2 years ago

Thanks for the bug report! As far as I can tell, that's a typo -- line 88 should read "/UNICHEM/UC_XREF.srcfiltered.txt" instead of "DrugBank/UC_XREF.srcfiltered.txt". I'm currently trying to get Babel to run on a new cluster here at RENCI, and have had to make a number of minor changes to the files to get them to run -- you can see my changes in branch add-dockerfile or PR #37. For example, here's the diff of the changes I made to chemical.snakefile. Try making that change on your end and let me know if that fixes the issue!

Note that the changes in branch add-dockerfile have NOT been reviewed by Chris yet and so may be incorrect or introduce new bugs. This branch is also a work in progress, so it might also change in unexpected ways going forward. I think I have every target except for chemical working now, so it's pretty close to being done, but please let me know (or push changes to that branch yourself) if you notice that I've got something wrong. If everything works out, I should have this PR ready for review in a week or so.

gaurav commented 2 years ago

I also have the outputs from all the compendia except for chemical, but if you'd like me to send that to you once I have it, please let me know!

brettasmi commented 2 years ago

Thanks @gaurav. That fix worked to the extent that the build continued a bit further, but quickly failed thereafter on a 404 when trying to download UNII. I switched over to your add-dockerfile branch and made a bit more progress after grabbing those UNII files manually as specified. Eventually, I failed on the following:

[Wed Mar 30 16:29:57 2022]
rule chemical_mesh_ids:
    input: /Users/bsmith/isb/Babel/babel_downloads/MESH/mesh.nt
    output: /Users/bsmith/isb/Babel/babel_downloads/chemicals/ids/MESH
    jobid: 0
    resources: mem_mb=4293, disk_mb=4293, tmpdir=/var/folders/07/pj89k_t935d11c0mbncb959w0000gp/T

loading mesh.nt
[Wed Mar 30 16:29:57 2022]
Error in rule chemical_mesh_ids:
    jobid: 0
    output: /Users/bsmith/isb/Babel/babel_downloads/chemicals/ids/MESH

RuleException:
AttributeError in line 16 of /Users/bsmith/isb/Babel/src/snakefiles/chemical.snakefile:
module 'pyoxigraph' has no attribute 'MemoryStore'
  File "/Users/bsmith/isb/Babel/src/snakefiles/chemical.snakefile", line 16, in __rule_chemical_mesh_ids
  File "/Users/bsmith/isb/Babel/src/createcompendia/chemicals.py", line 99, in write_mesh_ids
  File "/Users/bsmith/isb/Babel/src/datahandlers/mesh.py", line 130, in write_ids
  File "/Users/bsmith/isb/Babel/src/datahandlers/mesh.py", line 16, in __init__
  File "/Users/bsmith/.pyenv/versions/3.7.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run

I think that probably has to do with this: https://github.com/oxigraph/oxigraph/issues/57 so pyoxigraph may need to be pinned to a prior version in the requirements.txt here, or the code could be updated, of course. I'd be happy to submit a PR, but I'm not sure the best way to do that given the quick iteration on your WIP branch.

Generally speaking, I suspect it's probably going to be best to wait til you've completed your work here instead of trying to get this to work as-is. Also, in exploring the chemical snakefile, I spied a comment stating that it requires a machine with >256G of memory, which I could do but certainly wasn't expecting at this point :)

Please let me know if I can help further in any way.

gaurav commented 2 years ago

Interesting! I was just able to finish my first ever chemical build, and I seem to be on pyoxigraph==0.2.5 from July last year, not the 3.0.0 release. Could you please try that and see if it works? Here's all my other dependencies according to pip freeze: https://github.com/TranslatorSRI/Babel/blob/5216fe95c56928e83064b4db688622b47a53bb52/requirements.lock

Generally speaking, I suspect it's probably going to be best to wait til you've completed your work here instead of trying to get this to work as-is. That makes sense -- now that I have chemical working, I'll be working on polishing up this PR and submitting it to Chris for review sometime next week. So hopefully Babel should be fully functional once he's had a chance to make sure I didn't break anything :)

Also, in exploring the chemical snakefile, I spied a comment stating that it requires a machine with >256G of memory, which I could do but certainly wasn't expecting at this point :)

I was unable to run chemical on a Kubernetes node with less than 500G, so that seems accurate :). I never saw the memory usage go above 48%, though, so I'm still unclear on exactly how much memory it needs -- I think that'll take a few months more to figure out for sure.

If you'd like me to make the chemical outputs available somewhere for you to download, please let me know! I plan to eventually upload them to https://stars.renci.org/var/babel_outputs/, but if you need them in a hurry, I can move that further up my to-do list.

gaurav commented 2 years ago

@brettasmi The PR I linked to previously can now fully build the chemical compendium, and so we've merged it into the master branch. We haven't published the results yet, since we're investigating some odd changes between this version of the Babel outputs and the previous version, but I'm happy to send you a copy of the chemical outputs if you'd like!

gaurav commented 2 years ago

@brettasmi I wanted to check back with you and ask if you were able to get the Chemical compendium working. We've just regenerated a new version of Babel, so you can also message me on Slack if you'd like me to send you a copy of those files!

gaurav commented 1 year ago

@brettasmi I'm going to close this issue, but please do re-open it (or contact me on Slack) if you haven't been able to get the chemical compendium running or if you'd like a copy of our files to use.