franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

Error in GTDBK : for extractDnaBins command #118

Closed aksbiome closed 1 year ago

aksbiome commented 1 year ago

Hi Francisco,

When I used metaGEM, everything works well till the metawrap step. However, when I used "bash metGEM.sh -t extractDnaBins" command in step 5 of GTDBK then got an error: " No such file or directory", and I couldn't complete this step. Then I skipped this step and started with foolowing step Run GTDB-tk for taxonomic classification: bash metaGEM.sh -t gtdbtk -j 2 -c 24 -m 80 -h 12, surprisingly it worked, I guess. So can you look into this issue. Please find attached screen shot of my issue.

Another issue which need to resolve in metagem pipeline is when we run maxbin2 command then bins generated was not properly saved in respective folder, for that we need to manually add bins to that folder then only next metawrap command works.

Thanks!

New Microsoft PowerPoint Presentation

franciscozorrilla commented 1 year ago

Hey @aksbiome,

Thanks for reporting the buggy behavior of the maxbin Snakefile rule implementation. Could you specify if you used the maxbin or maxbinCross method?

https://github.com/franciscozorrilla/metaGEM/blob/ec01c24d4e7f9e2e1b12d09ee7f9b0ac1ddd0a5c/Snakefile#L797-L886

As you can see, one of them uses crossmapping information for binning, while the other does not. Note that maxbinCross is the default. Could you also run the command tree -d in your metaGEM folder to show me where/how the maxbin bins are incorrectly being stored? e.g.

$ tree -d
.
├── metaGEM
│   ├── colab
│   │   └── assemblies
│   │       ├── sample1
│   │       ├── sample2
│   │       └── sample3
│   ├── envs
│   ├── extra
│   └── scripts
└── models

By the way, note that the extractDnaBins rule simply goes into the bin_reassembly output and copies out the MAGs into a new folder while reformatting the name:

https://github.com/franciscozorrilla/metaGEM/blob/ec01c24d4e7f9e2e1b12d09ee7f9b0ac1ddd0a5c/Snakefile#L1699-L1726

It is actually not surprising that GTDBTk ran successfully, since it just runs on the metaWRAP bin reassembly output, so I imagine that metaWRAP was able to produce some output.

https://github.com/franciscozorrilla/metaGEM/blob/ec01c24d4e7f9e2e1b12d09ee7f9b0ac1ddd0a5c/Snakefile#L1315-L1330

Best wishes, Francisco