Closed Jigyasa3 closed 5 years ago
Hi @Jigyasa3,
a) The answer to both your questions is yes.
b) I am not sure that I fully understand what you mean by trait. There is a great preprint available using DEICODE for this purpose (https://www.biorxiv.org/content/10.1101/804443v2.full). I think your questions may be answered by taking look through it. That being said, do you mean that each trait is a different data modality (i.e. 16S, shotgun, transcriptomics) or do you mean each trait is a separate functional grouping of shotgun reads classified for taxonomy?
Thanks!
Dear @cameronmartino
Thank you for replying! I will check the preprint and get back to you if I have any questions.
Sorry, if I wasn't clear before. A trait is a separate functional group of shotgun reads classified for taxonomy.
Hey @cameronmartino
From what I understand from the tutorials and the paper that you referred (thanks for that!), each sample is considered to be independent. But if my samples are phylogenetically related, would I consider that after the log-ratio tranformation?
@Jigyasa3 Indeed accounting for phylogenetic relationships would need to be done downstream of this method of dimensionality reduction. Adding the option of directly integrating phylogenetics in this method is an active area of research.
Closing this issue - please reopen it if you have more questions.
visualizing the data by clustering and heatmaps- hey @cameronmartino
Thank you so much for replying to my questions! I was wondering if we can visualize the distance matrix obtained from DEICODE via commonly used methods of visualization? (I am looking into Qurro and Emperor ordination plots also)
But is it feasible to convert the beta-diversity distance among samples to mean per group so that we can visualize them by heatmaps or hierarchical clustering?
Calculating the mean of log-ratio transformed data per group has been done before (not on distance matrices) -http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.876.3979&rep=rep1&type=pdf
Hi @Jigyasa3
The output of DEICODE (standalone and QIIME2) gives a skbio distance matrix (see: http://scikit-bio.org/docs/0.5.1/generated/generated/skbio.stats.distance.DistanceMatrix.html).
This distance matrix can be read as a python object and can be plotted, exported as a pandas data frame, along with many other helpful functionalities.
From there you should be able to do the things you are looking for. Does this answer your question?
Thank you it does answer my question!
Awesome!
Sorry, I keep opening the issue. But I do have a follow-up question to the metagenome paper you linked. I went through the whole tutorial of HUMANn2 and their data-processing doesn't involve log-ratio transformation to account for the compositionality of data. They use RPK and RPKM for the between-sample variation. And a number of internal cutoffs (per sample) to account for the within-sample variation.
So technically log-ratio transformations have not been applied to metagenome analysis let?
The paper does a downstream conversion of pathways' relative abundance to beta-diversity using DEICODE, but that is for PCoA beta-diversity measurements.
Analysis by HUMANn2 and Songbird are not employing log-ratio transformations at all. Is that a correct assessment?
I wanted to apply this software to shotgun metagenome count data. I was wondering if-
a) Is it feasible to apply this software to shotgun metagenome count data? I have the host tree for downstream statistical analysis, but not the microbe phylogeny. According to the tutorial, the robust clr transformation is independent of microbial phylogeny, is that correct?
b) The tutorial mentions that the bacteria should not be clustered by taxonomic levels. Does that also apply to shotgun sequencing data? In QIIME, the trait remains the same (its 16S rRNA gene), but if I want to analyze multiple traits, should each trait be transformed individually or together?
As in, is it possible to use a matrix like the following- Matrix 1-
Trait1\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait1\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus ... ... Trait1\Host5\tbacteria1_class\tbacteria1_order\tbacteria1_genus
OR
Matrix2- < for mutiple traits> Trait1\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait1\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus Trait2\tHost1\tbacteria1_class\tbacteria1_order\tbacteria1_genus Trait2\tHost1\tbacteria2_class\tbacteria2_order\tbacteria2_genus ... ... Trait5\Host5\tbacteria1_class\tbacteria1_order\tbacteria1_genus
Looking forward to your reply!