franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

bug: problems with GTDBTk, small correction in the snakefile needed #45

Closed shreyanshumale closed 3 years ago

shreyanshumale commented 3 years ago

Hello Francisco,

Thanks for making this pipeline!

While running the gtdbtk task, I encountered a 'Missing input error' during the dry run. After a lot of searching, I found that the config.yaml file, the classification folder is called GTDBTk, whereas, in the Snakefile, it is called GTDBtk.

So I changed the config.yaml entry. The dry run was successful this time, although the actual execution was a failure as it couldn't find the ID folder. For this, I just changed the '$(basename $(dirname {input}))' to '{input}' in line 1330 of the Snakefile.

The run has not thrown any errors so far. I just wanted to notify you of this small mistake so that others don't have to face it in the future!

Please let me know if I made an error in any of the steps!

Thanks and regards, Shreyansh

franciscozorrilla commented 3 years ago

Hi Shreyansh,

Thank you for bringing this bug to my attention! The issue should be fixed now with the latest commit. I opted to modify the output line of the GTDBTk rule instead of the config file.

Please let me know if you run into any further issues with this or any other rule in the pipeline.

Best wishes, Francisco

franciscozorrilla commented 3 years ago

I noticed that the bug is not fully fixed. GTDB-Tk expects an input folder named according to the sample ID, yet the input folder with bins is named reassembled_bins. Fixed by modifying the GTDB-Tk call to reflect the basename of the input for the rule.