MGXlab / social_niche_breadth_SNB

Calculate the Social Niche Breadth (SNB) score of all taxonomic lineages in a set of microbiomes.
MIT License
8 stars 1 forks source link

Filter data and use relative abundance #2

Closed wqssf102 closed 1 year ago

wqssf102 commented 1 year ago

Hi, Thank you for developing SNB algorithm, is it possible to add the function of inputting relative abundance to calculate SNB? Because our data may be derived from metagenomic or other, and not count abundance, such as relative abundance, TPM,.... Looking forward to your help.

Thanks, Qiusheng WU

bastiaanvonmeijenfeldt commented 1 year ago

Hi @wqssf102,

Thanks for your question! Currently we do not allow for this because in the paper we used an absolute abundance cut-off (of 5 reads) for the pairwise dissimilarity calculations. However, I have included the option for a relative abundance cut-off for the pairwise dissimilarity calculations in this code (with the --c2 / --pairwise-comparisson-cutoff set < 1) so I can implement allowance for relative abundance tables as well. I put it on the to-do list.

In the meantime, our data is also metagenomic, and the 'absolute' count refers to number of sequenced reads. So even though this is compositional data, what the script needs is read counts and total number of taxonomically annotated reads per sample in the header. This is what for example TPM is also calculated from. Could you structure your tables in that way?

As a side note, I have briefly tested a relative --c2 cut-off and results seem to be quantitatively very similar to an absolute abundance cut-off at least for our dataset.

Best wishes,

Bastiaan

wqssf102 commented 1 year ago

ok, i get it,thank you .

bastiaanvonmeijenfeldt commented 1 year ago

I have uploaded a new version of the script (v0.2) that allows for relative abundance tables as an input. The script assumes whether the input file is a count table or relative abundance table based on the header. Read count tables should have total number of taxonomically annotated reads in the header, and relative abundance tables just the sample names. It throws an error if the content does not match the header.

If a relative abundance table is supplied the --c2 / --pairwise_comparisson_cutoff has to be set < 1. With a read count table, --c2 / --pairwise_comparisson_cutoff can be set to any value. Note that in the paper, we used a cut-off of 5 reads.

wqssf102 commented 1 year ago

Great, thank you very much!