Error in if (dispersion == 1) "LRT" else "scaled dev." : missing value where TRUE/FALSE needed

cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier

https://cafferychen777.github.io/ggpicrust2/

MIT License

110 stars 13 forks source link

Error in if (dispersion == 1) "LRT" else "scaled dev." : missing value where TRUE/FALSE needed #78

Open Selucote8223 opened 12 months ago

Selucote8223 commented 12 months ago

Dear Caffery Yang,

I followed the directions described in the workflow, but I got errors for both ALDEx2 and LinDA methods. My metadata file is as follow:

sample-id mat-colour year month day 0913CV_3C Yellow 2021 Dic Unknown 0913CV_5E Red 2021 Dic Unknown 0913CV_7G Yellow 2021 Dic Unknown 0913CV_11K White 2021 Dic Unknown

setting "sample-id" for "group" option when differential abundance analysis was performed.

For ALDEx2 method the error was as described in title:

Error in if (dispersion == 1) "LRT" else "scaled dev." : missing value where TRUE/FALSE needed

The error showed with LinDA was the following:

Error in [.data.frame(LinDA_metadata_df, , matching_columns) : undefined columns selected

Could you help me? please.

Thank you!!

cafferychen777 commented 12 months ago

Dear @Selucote8223,

Thank you for reaching out with your query regarding the ALDEx2 and LinDA methods errors. I appreciate the details you have provided.

I noticed that your metadata file contains only four samples. Could you please confirm if these four samples are the entirety of your dataset? In statistical terms, this might pose a challenge. Many differential abundance (DA) methods, including those you are trying to use, typically require a minimum of two samples per group to perform valid comparisons and analyses. This requirement is crucial for statistical validity and to ensure reliable results.

If your dataset indeed comprises only these four samples, this might be the root cause of the issues you are encountering. In such a scenario, adding more samples to each group or considering a method suitable for very small sample sizes might be necessary.

Looking forward to your confirmation and any further details you can provide, so we can assist you more effectively.

Best regards, Caffery Yang

Selucote8223 commented 12 months ago

Dear Caffery Yang,

Indeed, this seems to be the problem, as you highlighted. I'm going to try alternative analysis to fit results to my outputs.

Thank you!

SoniramDK commented 2 months ago

I get the same error when trying to run LinDA daa patway. My metadata file is the following

SampleID,Group,Condition bc_92,Group A,Test 1 bc_93,Group B,Test 2 bc_94,Group C,Test 3 bc_95,Group D,Test 4 bc_96,Group E,Test 5

I loaded this file using: metadata <- read.csv("C://Users//karad//OneDrive//Documents//picrust2-2.5.3//THIS_metadata_chicken.csv", stringsAsFactor = FALSE)

and after running daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "SampleID", daa_method = "LinDA", select = NULL, reference = NULL)

I get the following error: Running LinDA analysis... Error in [.data.frame(LinDA_metadata_df, , matching_columns) : undefined columns selected

I can not see the reason why. Can anybody help my fix this error, or suggest a solution that worked for you?

Thank's!

cafferychen777 commented 2 months ago

Dear @SoniramDK,

Thank you for reaching out with your question regarding the error you're encountering while running the LinDA pathway analysis using ggpicrust2.

Looking at the metadata you provided:

SampleID,Group,Condition
bc_92,Group A,Test 1
bc_93,Group B,Test 2
bc_94,Group C,Test 3
bc_95,Group D,Test 4
bc_96,Group E,Test 5

Could you confirm if this is the complete metadata file? If it is, there's a statistical concern I need to bring to your attention.

It appears that each group in your metadata contains only a single sample. Performing differential abundance analysis with only one sample per group is statistically problematic because it doesn't provide any variability within each group. This lack of replication within groups prevents the analysis from estimating variance accurately, which is crucial for determining whether observed differences in abundance are significant or just due to random variation.

Without replicates, the analysis cannot effectively distinguish between true biological differences and noise, leading to unreliable results. To obtain meaningful and robust results, it's important to have multiple samples within each group, allowing for a more accurate estimation of the within-group variance and thus more reliable differential abundance analysis.

Please let me know if there's anything further I can assist you with.

Best regards,
Caffery Yang

SoniramDK commented 2 months ago

Dear @cafferychen777,

Thanks for the quick reply!

Indeed that file contains the complete metadata. Let me explain how my metadata was made. I study the alterations in chicken gut microbiome caused by a certain disease. For that study, we had 5 groups of chicken, each group treated in a different way regarding the disease. We isolated the gut of all the chickens (12 chickens per group) for each group, and mixed the gut inside of every chicken in the group into one sample. So we have 5 samples in total, with each sample containing the gut of 12 chickens.

So each sample lets say represents one group.

Do you have any suggestions on how I can alter my metadata in order to make LinDA work?

cafferychen777 commented 2 months ago

Dear @SoniramDK,

Thank you for providing further details about your experiment and the metadata.

After reviewing your experimental design, I must point out that the current approach poses significant challenges for conducting reliable differential abundance analysis using LinDA or any other statistical method. Mixing the gut samples from 12 chickens into one pooled sample for each group eliminates the within-group variability that is essential for robust statistical analysis. This approach violates several fundamental principles in both statistics and omics studies.

When samples are pooled in this way, it becomes impossible to capture the natural biological variation within each group. This variation is crucial for the statistical models to distinguish between true biological differences and random noise. Without individual replicates within each group, the analysis lacks the necessary information to estimate variability accurately, leading to results that are likely to be unreliable.

For a more statistically sound and biologically meaningful analysis, I strongly recommend reconsidering the experimental design. Ideally, you should aim to collect individual samples from each chicken within the groups rather than pooling them. This will allow you to capture the within-group variability and perform a proper differential abundance analysis that can yield valid and interpretable results.

Please feel free to reach out if you need further guidance on experimental design or if there’s anything else I can assist you with.

Best regards,
Caffery Yang