cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
MIT License
109 stars 13 forks source link

ggpicrust on ITS : Error in `$<-.data.frame`(`*tmp*`, "description", value = character(0)) : le tableau de remplacement a 0 lignes, le tableau remplacé en a 1 #81

Open SueFletcher opened 10 months ago

SueFletcher commented 10 months ago

I want to thank you for your interactive responses to all the published issues and the various codes that you provided. Thanks to your assistance, I successfully ran the pipeline on 16S data.

Currently, I am working with ITS, and I ran PICRUSt on ITS data. However, I encountered an error when attempting to test ggPICRUSt.

########### Funfal metadata <- read_delim( "fungal_metdddddd.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE )

metacyc_abundance <- read_delim( "fungal_pred_metagenome_unstrat.tsv", delim = "\t", escape_double = FALSE, trim_ws = TRUE, show_col_types = FALSE )

metacyc_daa_results_df <- pathway_daa(abundance = metacyc_abundance %>% column_to_rownames("function"), metadata = metadata, group = "envi", daa_method = "LinDA")

metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

My question is twofold:

1/ how can I resolve this error?

Starting pathway annotation... DAA results data frame is not null. Proceeding... KO to KEGG is set to FALSE. Proceeding with standard workflow... Loading MetaCyc reference data... Error in $<-.data.frame(*tmp*, "description", value = character(0)) : le tableau de remplacement a 0 lignes, le tableau remplacé en a 1

Additionally, how can I convert EC numbers to pathways the same as you have in your example data(metacyc_abundance)? my data : image example that ypu provided : image

cafferychen777 commented 10 months ago

Dear User,

Thank you for reaching out and for using ggPICRUSt2. I'm glad to hear that you've successfully used the pipeline for 16S data.

Regarding your queries:

  1. Error Resolution: The error you're encountering, Error in '$<-.data.frame'('*tmp*', "description", value = character(0)) : replacement has 0 rows, data has 1, suggests an issue with data frame manipulation. This typically occurs when a function expects a column that either doesn't exist or has a different format than anticipated. To resolve this, ensure that your input data frames have the correct structure and that all required columns are present and correctly formatted.

  2. EC Numbers to Pathways Conversion: As per your second question about converting EC numbers to pathways as shown in the example data (metacyc_abundance), it's important to note that direct conversion between the two data sets you mentioned is not feasible. The MetaCyc data you refer to can be found in the outputs of PICRUSt2. This means that to obtain similar data, you would need to process your ITS data through the PICRUSt2 pipeline, ensuring it includes the relevant steps to generate pathway abundance data based on EC numbers.

It's crucial to align your data processing steps with the specifics of the PICRUSt2 pipeline to achieve the desired outputs. If you encounter any specific issues or errors during this process, feel free to share them, and I'll be happy to assist further.

Best regards, Chen YANG

SueFletcher commented 10 months ago

@cafferychen777, I resolved the issue by focusing on pathways instead of functions. It's strange, but when I worked with pathways, I didn't encounter the same problem.

SueFletcher commented 10 months ago

@cafferychen777 When I applied PICRUSt2 to my ITS data, I didn't generate the 'ko_metagenome_out' folder ; only ec_ITS_counts.txt_metagenome_out folder was generated. Is this normal? Additionally, does it make sense to use MetaCyc pathway abundance (as in your example) to predict ITS functional analysis instead of KEGG pathways (as you did in the first tutorial)? What's the difference between these two databases when performing exploratory prediction analysis of ITS please?

cafferychen777 commented 10 months ago

Dear @SueFletcher,

Thank you for reaching out with your questions regarding the use of PICRUSt2 with ITS data. For your first concern about not generating the 'ko_metagenome_out' folder and only obtaining the 'ec_ITS_counts.txt_metagenome_out' folder, this might require specific insights from the PICRUSt2 development team. They would be best equipped to clarify whether this behavior is expected with ITS data.

Regarding your second question about using MetaCyc pathway abundance for ITS functional analysis versus KEGG pathways, the difference in the databases and their application in exploratory prediction analysis is quite nuanced. The PICRUSt2 developers and the community around it would be able to provide a more detailed and accurate explanation.

I recommend you direct these inquiries to the PICRUSt2 GitHub repository, where the developers and other experienced users can assist you further. You can find the repository and issue tracker here: PICRUSt2 GitHub Repository.

Best regards, Chen YANG

MizaR108 commented 9 months ago

Dear cafferychen77, I have a similar issue with @SueFletcher. It will be great helpful to me if you give any comments on my issue. Thank you so much.

str(EC_daa_results_df) 'data.frame': 2045 obs. of 7 variables: $ feature : chr "EC.1.14.13.83" "EC.2.1.1.152" "EC.1.16.1.1" "EC.1.7.2.5" ... $ method : chr "Maaslin2" "Maaslin2" "Maaslin2" "Maaslin2" ... $ group1 : chr "diatom" "diatom" "diatom" "diatom" ... $ group2 : chr "dinoflagellate" "dinoflagellate" "dinoflagellate" "dinoflagellate" ... $ p_values : chr "6.47292750749603e-44" "8.07867809749021e-44" "4.58658004347928e-40" "7.06734591041652e-40" ... $ adj_method: chr "BH" "BH" "BH" "BH" ... $ p_adjust : num 8.26e-41 8.26e-41 3.13e-37 3.61e-37 8.36e-36 ...

EC_daa_annotate_results_df <- pathway_annotation(pathway = "EC", daa_results_df = EC_daa_results_df, ko_to_kegg = FALSE) Starting pathway annotation... DAA results data frame is not null. Proceeding... KO to KEGG is set to FALSE. Proceeding with standard workflow... Loading EC reference data... $<-.data.frame(*tmp*, "description", value = character(0))에서 다음과 같은 에러가 발생했습니다: replacement has 0 rows, data has 1

Also, I found there is no error even all pathway information (name, description, class, map) are "NA" when if I changed "ko_to_kegg" from "FALSE" to "TRUE". like this:

EC_daa_annotate_results_df <- pathway_annotation(pathway = "EC", daa_results_df = EC_daa_results_df, ko_to_kegg = TRUE) Starting pathway annotation... DAA results data frame is not null. Proceeding... KO to KEGG is set to TRUE. Proceeding with KEGG pathway annotations... We are connecting to the KEGG database to get the latest results, please wait patiently.

The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation.

Returning DAA results filtered annotation data frame...

I understand 'ko_to_kegg = TRUE' is not appropriate when using EC abundance data, but I just tried to test.