Composition Vis is summarising strangly

Xentrics commented 1 year ago

Hello :) I am very pleased with the pipeline so far, but I have trouble understanding one step.

When I want to compile the contig abundances from scratch, I do:

take abundances from each sample
take GTDK annotations from each sample
merge this information per sample

However, in the compositionViz script, I see what might be an error.

the bins in stats/abundance.stats are unique (sample + bin)
the bins in stats/GTDBTK.stats are not unique (bin)

So in the join step of the script, any bin from GTDBTK is used multiple times with different samples. In other words: the sample source of the contig is ignored in the merge step.

Is this really as intended? If not, the snakemake rule for compositionViz should probably adjust the contig names as they are in abundance.stats.

franciscozorrilla commented 1 year ago

Hey @Xentrics glad that you are finding metaGEM useful, and sorry to hear you were having trouble with this.

I had a look at the compositionVis rule and indeed it appears that there is a small bug in the for loop starting in line 1387

https://github.com/franciscozorrilla/metaGEM/blob/19398d3262c055a5a7b1ef65e47004507fecc178/Snakefile#L1356-L1400

As you can see, it creates the variable $samp with the sample information but then it is not being appended to the bin IDs. Your abundance file does have the sample info because the for loop summarizing the abudance file starting on line 1375 is correctly appending the sample ID to the bin IDs.

I have made the appropriate changes in this commit 22ad184d8e3b24e7591473cba347bb9b9adbefe6. Could you please, delete your GTDBTk.stats file, make the same modification to your Snakefile and re-run the rule?

Best wishes, Francisco

Xentrics commented 1 year ago

I actually had to adjust a bit more, because now the header of the stats/GTDBTK.stats contains the first samples' name. So instead of header=$(head -n 1 GTDBTk.stats) I need header=$(head -n 1 GTDBTk.stats | sed 's/^.*\.//g')

franciscozorrilla commented 1 year ago

Thanks, just pushed the changes through 💎

franciscozorrilla / metaGEM

Composition Vis is summarising strangly #110