cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
MIT License
102 stars 11 forks source link

What is the statistical test applied by metagenomeSeq? #29

Closed Teo2091 closed 1 year ago

Teo2091 commented 1 year ago

Hello everyone,

I would like to know what is the statistical test applied by metagenomeSeq? Because in the pathway_daa function output, only the metagenomeSeq method is specified.

Thank you, Matteo

cafferychen777 commented 1 year ago

Dear Matteo,

Thank you for reaching out and for your interest in metagenomeSeq.

To answer your question, metagenomeSeq uses a Zero-Inflated Log-Normal (ZILN) model. This statistical model is especially suited for metagenomics data because it can handle zero-inflation and continuous, non-negative data effectively.

Please let me know if you need further clarification or have any other questions about metagenomeSeq.

Best regards,

Teo2091 @.***>于2023年5月22日 周一23:12写道:

Hello everyone,

I would like to know what is the statistical test applied by metagenomeSeq? Because in the pathway_daa function output, only the metagenomeSeq method is specified.

Thank you, Matteo

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/29, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTUBVGJCKAEVVKFTJ43XHN645ANCNFSM6AAAAAAYKS6BHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Teo2091 commented 1 year ago

Hi @cafferychen777 ,

Thank you so much for your support.

I have another question regarding the choice of using a statistical model. I state that the prediction of the pathways was calculated starting from the biom file normalized in CSS and the abundances were transformed into relative abundances (compositional data).

PICRUSt2 returned me the following table:

<!DOCTYPE html>

pathway | IBD_10 | IBD_17 | IBD_18 | IBD_19 | IBD_2 -- | -- | -- | -- | -- | -- 1CMET2-PWY | 0.5465172882153324 | 0.3377042200619037 | 0.3480050708737901 | 0.37868503235301076 | 0.5083044705245738 ALL-CHORISMATE-PWY | 0.04787401491196773 | 0.0 | 0.16109123022225882 | 0.0674150650166517 | 0.0 ANAEROFRUCAT-PWY | 0.5434450173880835 | 0.2113320353615017 | 0.5184411936816795 | 0.3951961385198568 | 0.5823554962659186 ANAGLYCOLYSIS-PWY | 0.7538277178508417 | 0.47010082509198325 | 0.4990015807337455 | 0.5348488909894545 | 0.6612861296934521 ARG+POLYAMINE-SYN | 0.09654822975725606 | 0.0 | 0.18278652325147193 | 0.1118830385053522 | 0.032285184934744145 ARGDEG-PWY | 0.02698526039802911 | 0.0 | 0.13125028626535068 | 0.042193467093528 | 0.0 ARGORNPROST-PWY | 0.057156353830507094 | 0.015941504629641053 | 0.18942165025741362 | 0.149949814982684 | 0.18162227360883218 ARGSYN-PWY | 0.5956135158560697 | 0.3071337208305732 | 0.24849882084202063 | 0.33723439410332456 | 0.4386114736779196 ARGSYNBSUB-PWY | 0.6672876975133343 | 0.3635510647118449 | 0.24907095286223332 | 0.3706586620950412 | 0.4176076103169751 ARO-PWY | 0.7732196274665936 | 0.47486507817115475 | 0.3871948003949325 | 0.4765944125060461 | 0.6649511011599307 ASPASN-PWY | 0.5099864226673008 | 0.3313803219988792 | 0.23833688894023294 | 0.31962705700866617 | 0.3857929734198072 AST-PWY | 0.023076923076923075 | 0.0 | 0.11538461538461538 | 0.03461538461538461 | 0.0 BIOTIN-BIOSYNTHESIS-PWY | 0.051862186985359576 | 0.0 | 0.16597112157576788 | 0.061828808052803476 | 0.03542774507624584 BRANCHED-CHAIN-AA-SYN-PWY | 0.809119460940484 | 0.48742970351634946 | 0.31026460726705557 | 0.45112014745075385 | 0.533223246144144 CALVIN-PWY | 0.7471856063348943 | 0.43539068809626824 | 0.5017490980464553 | 0.558853576717294 | 0.7335935875854149 CENTBENZCOA-PWY | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 CENTFERM-PWY | 0.10084227280343115 | 0.004444222095217799 | 0.0 | 0.08962097946255058 | 0.00441333658115254 COA-PWY | 0.5453051217028345 | 0.3518796414366713 | 0.344076152890069 | 0.37632569091459495 | 0.602617761047426 COBALSYN-PWY | 0.5667093955412954 | 0.33121562231953955 | 0.13833285548273205 | 0.2358862633207529 | 0.5792234415938627 CODH-PWY | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 COLANSYN-PWY | 0.03020029891826877 | 0.023168873943814977 | 0.22240290933212184 | 0.008753927046590228 | 0.06327963037493521 COMPLETE-ARO-PWY | 0.7980034079041006 | 0.48753676656174344 | 0.39119291357316593 | 0.48629642816901375 | 0.7028157460472372 DAPLYSINESYN-PWY | 0.6703393472166221 | 0.4299659400849499 | 0.3272614253100503 | 0.3726560584902267 | 0.5231532428654286 DENITRIFICATION-PWY | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 DENOVOPURINE2-PWY | 0.13661965229461256 | 0.093778418536116 | 0.42490701397830993 | 0.35011587756452256 | 0.4261284606944567 DTDPRHAMSYN-PWY | 0.7336082448891426 | 0.43600840206699604 | 0.44384054089679187 | 0.5529230588790547 | 0.740065661700592 ECASYN-PWY | 0.030186334566619684 | 0.0 | 0.130730569634781 | 0.05191429272823931 | 0.0 ENTBACSYN-PWY | 0.06457087428537162 | 0.0 | 0.23592197493758077 | 0.10957497549137835 | 0.0 FAO-PWY | 0.0571322203586654 | 4.791657928562685e-07 | 0.21694134380892882 | 0.07621951797554785 | 4.791660644959697e-07 FASYN-ELONG-PWY | 0.2681395105494251 | 0.08166754471846616 | 0.3899945097217771 | 0.20262857699489337 | 0.3711544908857412

In this output there are zeros, but they are values that concern the expressions of the pathways and not the abundances of the bacteria. So, based on these values, is it right to use the model applied with metagenomeSeq? Or is it recommended to use another statistical model?

Thank again, Matteo

cafferychen777 commented 1 year ago

Dear Matteo,

Thank you for reaching out and providing such detailed information.

Based on the data you've shared, it is indeed appropriate to use metagenomeSeq. This model is designed to handle the kind of compositional data you're working with, and it can effectively manage the zeros in your dataset, which, as you've noted, represent pathway expressions rather than bacterial abundances.

However, it's also worth considering other statistical models that are well-suited to microbiome data analysis. Two such models are LinDA and ALDEx2.

LinDA is a linear discriminant analysis tool that can help you identify features (like pathways) that are most likely to explain differences between your groups. It's particularly useful when you're dealing with high-dimensional data.

ALDEx2, on the other hand, is designed to handle the compositional nature of high-throughput sequencing data. It uses a Dirichlet-multinomial framework to estimate the variance within your data, and it can provide robust differential abundance analysis.

In conclusion, while metagenomeSeq is a good choice, exploring other models like LinDA and ALDEx2 could provide additional insights into your data.

I hope this helps! Please feel free to reach out if you have any further questions.

Best regards, Chen YANG

On Tue, 23 May 2023 at 15:41, Teo2091 @.***> wrote:

Hi @cafferychen777 https://github.com/cafferychen777 ,

Thank you so much for your support.

I have another question regarding the choice of using a statistical model. I state that the prediction of the pathways was calculated starting from the biom file normalized in CSS and the abundances were transformed into relative abundances (compositional data).

PICRUSt2 returned me the following table:

pathway IBD_10 IBD_17 IBD_18 IBD_19 IBD_2 1CMET2-PWY 0.5465172882153324 0.3377042200619037 0.3480050708737901 0.37868503235301076 0.5083044705245738 ALL-CHORISMATE-PWY 0.04787401491196773 0.0 0.16109123022225882 0.0674150650166517 0.0 ANAEROFRUCAT-PWY 0.5434450173880835 0.2113320353615017 0.5184411936816795 0.3951961385198568 0.5823554962659186 ANAGLYCOLYSIS-PWY 0.7538277178508417 0.47010082509198325 0.4990015807337455 0.5348488909894545 0.6612861296934521 ARG+POLYAMINE-SYN 0.09654822975725606 0.0 0.18278652325147193 0.1118830385053522 0.032285184934744145 ARGDEG-PWY 0.02698526039802911 0.0 0.13125028626535068 0.042193467093528 0.0 ARGORNPROST-PWY 0.057156353830507094 0.015941504629641053 0.18942165025741362 0.149949814982684 0.18162227360883218 ARGSYN-PWY 0.5956135158560697 0.3071337208305732 0.24849882084202063 0.33723439410332456 0.4386114736779196 ARGSYNBSUB-PWY 0.6672876975133343 0.3635510647118449 0.24907095286223332 0.3706586620950412 0.4176076103169751 ARO-PWY 0.7732196274665936 0.47486507817115475 0.3871948003949325 0.4765944125060461 0.6649511011599307 ASPASN-PWY 0.5099864226673008 0.3313803219988792 0.23833688894023294 0.31962705700866617 0.3857929734198072 AST-PWY 0.023076923076923075 0.0 0.11538461538461538 0.03461538461538461 0.0 BIOTIN-BIOSYNTHESIS-PWY 0.051862186985359576 0.0 0.16597112157576788 0.061828808052803476 0.03542774507624584 BRANCHED-CHAIN-AA-SYN-PWY 0.809119460940484 0.48742970351634946 0.31026460726705557 0.45112014745075385 0.533223246144144 CALVIN-PWY 0.7471856063348943 0.43539068809626824 0.5017490980464553 0.558853576717294 0.7335935875854149 CENTBENZCOA-PWY 0.0 0.0 0.0 0.0 0.0 CENTFERM-PWY 0.10084227280343115 0.004444222095217799 0.0 0.08962097946255058 0.00441333658115254 COA-PWY 0.5453051217028345 0.3518796414366713 0.344076152890069 0.37632569091459495 0.602617761047426 COBALSYN-PWY 0.5667093955412954 0.33121562231953955 0.13833285548273205 0.2358862633207529 0.5792234415938627 CODH-PWY 0.0 0.0 0.0 0.0 0.0 COLANSYN-PWY 0.03020029891826877 0.023168873943814977 0.22240290933212184 0.008753927046590228 0.06327963037493521 COMPLETE-ARO-PWY 0.7980034079041006 0.48753676656174344 0.39119291357316593 0.48629642816901375 0.7028157460472372 DAPLYSINESYN-PWY 0.6703393472166221 0.4299659400849499 0.3272614253100503 0.3726560584902267 0.5231532428654286 DENITRIFICATION-PWY 0.0 0.0 0.0 0.0 0.0 DENOVOPURINE2-PWY 0.13661965229461256 0.093778418536116 0.42490701397830993 0.35011587756452256 0.4261284606944567 DTDPRHAMSYN-PWY 0.7336082448891426 0.43600840206699604 0.44384054089679187 0.5529230588790547 0.740065661700592 ECASYN-PWY 0.030186334566619684 0.0 0.130730569634781 0.05191429272823931 0.0 ENTBACSYN-PWY 0.06457087428537162 0.0 0.23592197493758077 0.10957497549137835 0.0 FAO-PWY 0.0571322203586654 4.791657928562685e-07 0.21694134380892882 0.07621951797554785 4.791660644959697e-07 FASYN-ELONG-PWY 0.2681395105494251 0.08166754471846616 0.3899945097217771 0.20262857699489337 0.3711544908857412

In this output there are zeros, but they are values that concern the expressions of the pathways and not the abundances of the bacteria. So, based on these values, is it right to use the model applied with metagenomeSeq? Or is it recommended to use another statistical model?

Thank again, Matteo

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/29#issuecomment-1558709977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTRS5O6NXW5JQTGVKODXHRSY3ANCNFSM6AAAAAAYKS6BHA . You are receiving this because you were mentioned.Message ID: @.***>