Open WillErickson1 opened 8 months ago
Hi @WillErickson1,
Thank you for providing a detailed overview of your issue. Based on the error you encountered and the provided data structure, I suggest the following:
Metadata Modification: Try removing the second and third columns from your metadata
file. It appears there might be discrepancies between your dataset and the expected format for the ggpicrust2
function, and this step might help align your data more closely with what the function anticipates.
Sample Size Concern: I'd like to point out a statistical consideration. With only 3 samples in your dataset, the statistical power and interpretability of the results may be limited. Even if you obtain a visualization, deriving a statistically meaningful interpretation could be challenging. It's crucial to ensure that your results can be supported by a robust statistical analysis. If possible, consider increasing your sample size to bolster the validity of your conclusions.
Please give the above suggestion a try and let me know if it resolves the issue or if you have any other questions!
Best regards, Chen YANG
thank you for responding so fast I runned this script now : # Load necessary data: abundance data and metadata abundance_file <- "pred_metagenome_unstrat.tsv" metadata_2 <- read_delim( "metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE ) rm(metadata_2)
abundance_data <- read_delim(abundance_file, delim = "\t", col_names = TRUE, trim_ws = TRUE)
metadata_2 <- metadata_2[, -c(2, 3)]
results_data_input <- ggpicrust2(data = abundance_data, metadata = metadata_2, group = "Traitement", # For example dataset, group = "Environment" pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")
with input file :
I have this error now : Starting the ggpicrust2 analysis... Converting KO to KEGG... Processing provided data frame... Loading KEGG reference data. This might take a while... Performing KO to KEGG conversion. Please be patient, this might take a while... |============================================================================| 100% KO to KEGG conversion completed. Time elapsed: 14.2 seconds. Removing KEGG pathways with zero abundance across all samples... KEGG abundance calculation completed successfully. Performing pathway differential abundance analysis... Sample names extracted. Identifying matching columns in metadata... Matching columns identified: sample-id . This is important for ensuring data consistency. Using all columns in abundance. Converting abundance to a matrix... Reordering metadata... Converting metadata to a matrix and data frame... Extracting group information... Running LinDA analysis... Performing LinDA analysis... 0 features are filtered! The filtered data has 3 samples and 197 features will be tested! Fit linear models ... Completed. Processing LinDA results... LinDA analysis is complete. Success: Found 1 statistically significant biomarker(s) in the dataset. Annotating pathways... Starting pathway annotation... DAA results data frame is not null. Proceeding... KO to KEGG is set to TRUE. Proceeding with KEGG pathway annotations... We are connecting to the KEGG database to get the latest results, please wait patiently.
Processing pathways individually...
| | 0% Beginning annotation for pathway 1 of 1...
Annotated pathway 1 of 1. Time taken: 2.24 seconds.
|============================================================================| 100%
Pathway annotation completed.
Returning DAA results filtered annotation data frame...
Creating pathway error bar plots...
Error in $<-.data.frame
(*tmp*
, "group", value = c(2L, 2L, 1L)) :
le tableau de remplacement a 3 lignes, le tableau remplacé en a 1
In addition: Warning message:
In cbind(sample = colnames(sub_relative_abundance_mat), group = Group, :
number of rows of result is not a multiple of vector length (arg 2)
Hi @WillErickson1,
Thank you for sharing the details.
I see that you've run the script to load the necessary abundance data and metadata. Following that, you have run the ggpicrust2
function with the provided input.
From the provided error:
Error in `$<-.data.frame`(`*tmp*`, "group", value = c(2L, 2L, 1L)) :
replacement has 3 rows, data has 1
In addition: Warning message:
In cbind(sample = colnames(sub_relative_abundance_mat), group = Group, :
number of rows of result is not a multiple of vector length (arg 2)
It seems like there's an inconsistency with the data dimensions. This might be due to only one statistically significant result being found. Such a result can possibly affect the visualization, making it hard for the visualization to proceed.
To troubleshoot this:
Let me know how you'd like to proceed or if there's any other way I can assist.
Best regards,
Chen YANG
thanks what's the difference between ko_abundance and kegg_abundance kegg_abundance <- ko2kegg_abundance("path/to/your/pred_metagenome_unstrat.tsv") ko_abundance <- read.delim("path/to/your/pred_metagenome_unstrat.tsv") what does these 2 inputs refers to in picrust2 pipeline : https://github.com/picrust/picrust2/wiki/Full-pipeline-script
Hi @WillErickson1 ,
The difference between ko_abundance
and kegg_abundance
lies in their origins and how they represent the biological data.
KO (Kegg Orthology) Abundance: ko_abundance
refers to the abundance of specific gene orthologs in your sample. In the context of your script, it's being directly loaded from a .tsv
file:
ko_abundance <- read.delim("path/to/your/pred_metagenome_unstrat.tsv")
KEGG Abundance: kegg_abundance
, on the other hand, is derived from the KO abundances by converting them into pathway abundances. This means that kegg_abundance
represents the predicted abundance of entire metabolic pathways, which are composed of multiple KO groups. Your script uses the ko2kegg_abundance
function to achieve this conversion:
kegg_abundance <- ko2kegg_abundance("path/to/your/pred_metagenome_unstrat.tsv")
In the context of the PICRUSt2 pipeline:
ko_abundance
is a direct result from the prediction step. It gives you the abundance of different gene orthologs in your microbial community.
kegg_abundance
is more of a downstream analysis that allows you to understand the functional potential of your microbial community at the pathway level. This is more interpretable because it groups gene orthologs (KOs) into broader functional categories or pathways.
I hope this clarifies the difference between the two. Let me know if you have any further questions or if there's anything else I can assist with.
Best regards,
Chen YANG
Sorry I'm new in this field. Could you tell me exactly what they refers to. this is the architecture of output folder of picrust2 pipeline
-I tested the pipeline on the dataset suggested by picrust2 in their tutorial and I used edgeR as daa_method. and it has worked my wondering why it didn't work with limma or deseq2 ?
Hello @WillErickson1,
Regarding "exactly what they refers to":
The reference is to the pred_metagenome_unstrat.tsv.gz
file within the KO_metagenome_out
folder.
On why edgeR works but limma and deseq2 don't:
When testing with the tutorial dataset provided by picrust2, edgeR, when used as the differential abundance analysis method (daa_method
), functions as expected. However, limma and deseq2 do not. There could be various reasons for this, including issues with the format of the input data, software dependencies, or compatibility between versions.
About the most appropriate input for downstream analysis:
When moving to downstream analysis, the decision between using ko_abundance
or kegg_abundance
to generate pathway_errorbar()
, pathway_heatmap()
, and pathway_pca()
depends on the specific objectives of your analysis. The term ko_abundance
refers to the abundance of KEGG Orthology (KO), which represents genes or gene clusters. On the other hand, kegg_abundance
may relate to specific KEGG pathways.
ko_abundance
might be more relevant. kegg_abundance
could be the better choice.Regarding the interest in analyzing metacyc_abundance
:
MetaCyc is a database of non-redundant, experimentally validated metabolic pathways, distinct from KEGG. Analyzing metacyc_abundance
offers an alternative perspective on the metabolic pathway activity in the samples. By analyzing both KEGG and MetaCyc abundances simultaneously, you can achieve a more comprehensive and multi-layered biological interpretation. For further clarity on the second part of the tutorial, you might want to refer to this section on the ggpicrust2 GitHub page.
Best regards, Chen YANG
@cafferychen777 thank you for your time
Where can I find MetaCyc abundance results in the PICRUSt2 output folder ? does it in pathways_out ? ko abundance is used only for metacyc data that's why we set ko_to_kegg to FALSE ?? because when I tried to run this script results_file_input <- ggpicrust2(data = ko_abundance, metadata = metadata, group = "Environment", pathway = "KO", daa_method = "edgeR", ko_to_kegg = FALSE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name") it didn't work -pathway_pca and heatmap_heatmap are only performed on metacyc data ? we can perform them on ko abundance ? if it's the case how I should modify in the code ?
Hello @WillErickson1 ,
MetaCyc Abundance Results Location:
pathways_out
folder in the PICRUSt2 output directory.KO Abundance and ko_to_kegg
Setting:
ko_to_kegg
parameter is likely a setting to determine whether to convert KO identifiers to KEGG identifiers. It's set to FALSE when you are working with MetaCyc data since MetaCyc and KEGG are different databases with different identifiers. Thus, KO (KEGG Orthology) to KEGG conversion setting doesn't pertain when working with MetaCyc data.pathway_pca
and heatmap_heatmap
Execution:
pathway_pca
and heatmap_heatmap
on KO abundance data. While these analyses may typically be performed on MetaCyc data, they can also be executed on KO abundance data. Some users in the issue section have shared their code on how to do this, and you can refer to those examples to understand how to modify your code accordingly.Statistical Tests in daa_method
:
daa_method
specifies a different statistical analysis method. You should refer to the respective papers or documentation for each method to understand the statistical tests being performed.Best regards,
Chen YANG
Hello!
Thank you for the opportunity to get the visual results of Picrust! I have a similar problem with the one described above. I've done a step-by-step workflow : library(reader) library(ggpicrust2) library (Bible) library (neat verse) library(ggprism) library(patchwork)
kegg_abundance <- ko2kegg_abundance("E:/Озера метаданные/R/Picrust/pred_metagenome_unstrat.tsv") Loading data from file... Rows: 7066 Columns: 61
── Column specification ──────────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): function dbl (60): BEl1, BEl2, BEl3, Ber1, Ber2, Ber3, Bez1, Bez2, Bez3, BKa1, BKa2, BKa3, BKu1, BKu2, BKu3, Che1...
ℹ Use spec()
to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE
to quiet this message.
Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|==================================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 1.55 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
metadata <- read.csv("D:/CCA/forCCAchem.csv", header=TRUE, sep = ";") str(metadata) 'data.frame': 60 obs. of 7 variables: $ sample.id: chr "Kuch1" "Kuch2" "Kuch3" "Che1" ... $ location : chr "pelag" "litor" "zarosli" "pelag" ... $ area : chr "niznetavd" "niznetavd" "niznetavd" "niznetavd" ... $ climZone : chr "subtaiga" "subtaiga" "subtaiga" "subtaiga" ... $ Area : chr "small" "small" "small" "verysmall" ... $ Salin : chr "freshwater" "freshwater" "freshwater" "freshwater" ... $ soil : chr "pdggTn" "pdggTn" "pdggTn" "pdggTn" ... daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "climZone", daa_method = "ALDEx2", select = NULL, reference = NULL) Converting metadata to tibble... Sample names extracted. Identifying matching columns in metadata... Matching columns identified: sample.id . This is important for ensuring data consistency. Using all columns in abundance. Converting abundance to a matrix... Reordering metadata... Converting metadata to a matrix and data frame... Extracting group information... Running ALDEx2 with two groups. Performing t-test... Error in Math.factor(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : ‘round’ not meaningful for factors
Please, help me! With gratitude, Maria
I runned this script: library(readr) library(ggpicrust2) library(tibble) library(tidyverse) library(ggprism) library(patchwork)
Load necessary data: abundance data and metadata
abundance_file <- "pred_metagenome_unstrat.tsv" metadata_2 <- read_delim( "metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE ) rm(metadata_2)
Run ggpicrust2 with imported data.frame
abundance_data <- read_delim(abundance_file, delim = "\t", col_names = TRUE, trim_ws = TRUE)
Run ggpicrust2 with input data
results_data_input <- ggpicrust2(data = abundance_data, metadata = metadata, group = "Traitement", # For example dataset, group = "Environment" pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")
but I got this error : Starting the ggpicrust2 analysis... Converting KO to KEGG... Processing provided data frame... Loading KEGG reference data. This might take a while... Performing KO to KEGG conversion. Please be patient, this might take a while... |============================================================================| 100% KO to KEGG conversion completed. Time elapsed: 10.02 seconds. Removing KEGG pathways with zero abundance across all samples... KEGG abundance calculation completed successfully. Performing pathway differential abundance analysis... Sample names extracted. Identifying matching columns in metadata... Matching columns identified: NA . This is important for ensuring data consistency. Using all columns in abundance. Converting abundance to a matrix... Reordering metadata... Error in
metadata[, matching_columns]
: ! Can't subset columns withmatching_columns
. ✖ Subscriptmatching_columns
can't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace()
to see where the error occurred.this is how my input files looks like: metadata_2:
abundance_data
PS: I executed this example that u shared and it has worked
data(ko_abundance) data(metadata) results_file_input <- ggpicrust2(data = ko_abundance, metadata = metadata, group = "Environment", pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")