biocore / DEICODE

Robust Aitchison PCA from sparse count data
Other
33 stars 17 forks source link

Applying it on shot-gun metagenome data without microbial phylogeny info #47

Closed Jigyasa3 closed 4 years ago

Jigyasa3 commented 4 years ago

I wanted to apply this software to shotgun metagenome count data. I was wondering if-

a) Is it feasible to apply this software to shotgun metagenome count data? I have the host tree for downstream statistical analysis, but not the microbe phylogeny. According to the tutorial, the robust clr transformation is independent of microbial phylogeny, is that correct?

b) The tutorial mentions that the bacteria should not be clustered by taxonomic levels. Does that also apply to shotgun sequencing data? In QIIME, the trait remains the same (its 16S rRNA gene), but if I want to analyze multiple traits, should each trait be transformed individually or together?

As in, is it possible to use a matrix like the following- Matrix 1-

Trait1\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait1\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus ... ... Trait1\Host5\tbacteria1_class\tbacteria1_order\tbacteria1_genus

OR

Matrix2- < for mutiple traits> Trait1\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait1\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus Trait2\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait2\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus ... ... Trait5\Host5\tbacteria1_class\tbacteria1_order\tbacteria1_genus

Looking forward to your reply!

cameronmartino commented 4 years ago

Hi @Jigyasa3,

a) The answer to both your questions is yes.

b) I am not sure that I fully understand what you mean by trait. There is a great preprint available using DEICODE for this purpose (https://www.biorxiv.org/content/10.1101/804443v2.full). I think your questions may be answered by taking look through it. That being said, do you mean that each trait is a different data modality (i.e. 16S, shotgun, transcriptomics) or do you mean each trait is a separate functional grouping of shotgun reads classified for taxonomy?

Thanks!

Jigyasa3 commented 4 years ago

Dear @cameronmartino

Thank you for replying! I will check the preprint and get back to you if I have any questions.

Sorry, if I wasn't clear before. A trait is a separate functional group of shotgun reads classified for taxonomy.

Jigyasa3 commented 4 years ago

Hey @cameronmartino

From what I understand from the tutorials and the paper that you referred (thanks for that!), each sample is considered to be independent. But if my samples are phylogenetically related, would I consider that after the log-ratio tranformation?

cameronmartino commented 4 years ago

@Jigyasa3 Indeed accounting for phylogenetic relationships would need to be done downstream of this method of dimensionality reduction. Adding the option of directly integrating phylogenetics in this method is an active area of research.

cameronmartino commented 4 years ago

Closing this issue - please reopen it if you have more questions.

Jigyasa3 commented 4 years ago

visualizing the data by clustering and heatmaps- hey @cameronmartino

Thank you so much for replying to my questions! I was wondering if we can visualize the distance matrix obtained from DEICODE via commonly used methods of visualization? (I am looking into Qurro and Emperor ordination plots also)

But is it feasible to convert the beta-diversity distance among samples to mean per group so that we can visualize them by heatmaps or hierarchical clustering?

Calculating the mean of log-ratio transformed data per group has been done before (not on distance matrices) -http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.876.3979&rep=rep1&type=pdf

cameronmartino commented 4 years ago

Hi @Jigyasa3

The output of DEICODE (standalone and QIIME2) gives a skbio distance matrix (see: http://scikit-bio.org/docs/0.5.1/generated/generated/skbio.stats.distance.DistanceMatrix.html).

This distance matrix can be read as a python object and can be plotted, exported as a pandas data frame, along with many other helpful functionalities.

From there you should be able to do the things you are looking for. Does this answer your question?

Jigyasa3 commented 4 years ago

Thank you it does answer my question!

cameronmartino commented 4 years ago

Awesome!

Jigyasa3 commented 4 years ago

Sorry, I keep opening the issue. But I do have a follow-up question to the metagenome paper you linked. I went through the whole tutorial of HUMANn2 and their data-processing doesn't involve log-ratio transformation to account for the compositionality of data. They use RPK and RPKM for the between-sample variation. And a number of internal cutoffs (per sample) to account for the within-sample variation.

So technically log-ratio transformations have not been applied to metagenome analysis let?

The paper does a downstream conversion of pathways' relative abundance to beta-diversity using DEICODE, but that is for PCoA beta-diversity measurements.

Analysis by HUMANn2 and Songbird are not employing log-ratio transformations at all. Is that a correct assessment?