franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

Composition Vis is summarising strangly #110

Closed Xentrics closed 1 year ago

Xentrics commented 1 year ago

Hello :) I am very pleased with the pipeline so far, but I have trouble understanding one step.

When I want to compile the contig abundances from scratch, I do:

However, in the compositionViz script, I see what might be an error.

So in the join step of the script, any bin from GTDBTK is used multiple times with different samples. In other words: the sample source of the contig is ignored in the merge step.

Is this really as intended? If not, the snakemake rule for compositionViz should probably adjust the contig names as they are in abundance.stats.

franciscozorrilla commented 1 year ago

Hey @Xentrics glad that you are finding metaGEM useful, and sorry to hear you were having trouble with this.

I had a look at the compositionVis rule and indeed it appears that there is a small bug in the for loop starting in line 1387

https://github.com/franciscozorrilla/metaGEM/blob/19398d3262c055a5a7b1ef65e47004507fecc178/Snakefile#L1356-L1400

As you can see, it creates the variable $samp with the sample information but then it is not being appended to the bin IDs. Your abundance file does have the sample info because the for loop summarizing the abudance file starting on line 1375 is correctly appending the sample ID to the bin IDs.

I have made the appropriate changes in this commit 22ad184d8e3b24e7591473cba347bb9b9adbefe6. Could you please, delete your GTDBTk.stats file, make the same modification to your Snakefile and re-run the rule?

Best wishes, Francisco

Xentrics commented 1 year ago

I actually had to adjust a bit more, because now the header of the stats/GTDBTK.stats contains the first samples' name. So instead of header=$(head -n 1 GTDBTk.stats) I need header=$(head -n 1 GTDBTk.stats | sed 's/^.*\.//g')

franciscozorrilla commented 1 year ago

Thanks, just pushed the changes through 💎