cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
Other
92 stars 11 forks source link

ggpicrust2(): Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length #24

Closed mbrose2022 closed 8 months ago

mbrose2022 commented 1 year ago

Hi there,

I'm trying out ggpicrust2 on a small subset of data (eight samples) to iron out any problems before using the entire dataset. It wouldn't read my .tsv, so I created the metadata tibble. Could this you provide me with more information on the following error?

alculation may take a long time, please be patient. The kegg pathway with zero abundance in all the different samples has been removed. Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

If you want to analysis kegg pathway abundance instead of ko within the pathway. You should turn ko_to_kegg to TRUE.

The kegg pathway typically have the more explainable description.

library(readr) library(ggpicrust2) library(tibble) library(tidyverse) library(ggprism) library(patchwork) metadata <- tibble( sample_name = c("lessQ1", "lessQ2", "lessQ3", "lessQ4", "moreQ1", "moreQ2", "moreQ3", "moreQ4"), group = c("less", "less", "less", "less", "more", "more", "more", "more") )

daa_results_list <- ggpicrust2( file = "h:/picrust2_out/KO_metagenome_out/pred_metagenome_unstrat.tsv", metadata = metadata, group = "group", pathway = "KO", daa_method = "LinDA", p_values_bar = TRUE, p.adjust = "BH", ko_to_kegg = TRUE, order = "pathway_class", select = NULL, reference = NULL # If your metadata[,group] has more than two levels, please specify a reference. )

cafferychen777 commented 1 year ago

Dear @mbrose2022 ,

Thank you for reaching out to me regarding the error you encountered when using ggpicrust2. Based on the error message you provided, it seems that there is an issue with the row names in your data file.

Regarding your question about using a subset of metadata with ggpicrust2, it is possible to do so, but it may cause issues if the metadata does not represent the full dataset accurately. In your case, it seems that you have encountered some problems while using a subset of metadata.

However, before you do that, you could try removing the columns from "h:/picrust2_out/KO_metagenome_out/pred_metagenome_unstrat.tsv" that are not included in your metadata tibble. This will help you to narrow down the scope of your analysis and may also help you to identify any other issues that may be causing the error.

If this does not resolve the issue, you could try using an alternative workflow.

I hope this helps, and please feel free to reach out if you have any further questions or concerns.

Best regards, Chen YANG

gcuster1991 commented 1 year ago

Hi, I am having a similar issue. Any ideas what might be causing this issue?

When I try to run the following:

metadata<-data.frame(sample_data(phyloseq_merged_saline_indicators_deseq))
metadata$Salinity <- as.factor(metadata$Salinity)

results <- ggpicrust2(file = "/Users/gordoncuster/Desktop/Git_Projects/salinity_atriplex/data/Picrust/picrust_out_pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv",
                      metadata = metadata,
                      group = "Salinity",
                      pathway = "KO",
                      daa_method = "LinDA",
                      ko_to_kegg = TRUE,
                      order = "pathway_class",
                      p_values_bar = TRUE,
                      x_lab = "pathway_name",
                      )

I get the following message and error:

Rows: 5198 Columns: 31── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (1): function
dbl (30): S0.1, S0.10, S0.3, S0.5, S0.8, S10.2, S10.3, S10.7, S10.8, S10.9, S20.1, S20.2, S20.5, S20.8, S20.9, C0.2, C0.4, C0.6, C0.7, C0.8, C10.2, C10.3, C10.5, C10.7, C10.8, C20.2, C20.4,...
Use `spec()` to retrieve the full column specification for this data.
Specify the column types or set `show_col_types = FALSE` to quiet this message. Calculation may take a long time, please be patient.
The kegg pathway with zero abundance in all the different samples has been removed.
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

My metadata looks as: Salinity Treatment soil S0.1 0 N S S0.10 0 N S S0.3 0 N S S0.5 0 N S S0.8 0 N S S10.2 10 Y S

My KO table looks as: function S0.1 S0.10 S0.3 S0.5 S0.8 S10.2 S10.3 S10.7 S10.8 S10.9 S20.1 S20.2 S20.5 S20.8 S20.9 C0.2 C0.4 C0.6 C0.7 C0.8 C10.2 C10.3 C10.5 1 K00001 570.62 712.50 579.38 769.00 448.88 817.62 1246.25 1359.62 593.75 941.00 3140.50 4610.00 3331.0 3605.62 2783.50 843.33 790 674.33 724.00 637.33 1159.33 886.33 686.00 2 K00003 3189.50 3790.88 3021.88

cafferychen777 commented 1 year ago

Hi @gcuster1991 ,

Thank you for reaching out. It seems like you are encountering an error when running the ggpicrust2 function due to an issue with your metadata.

Based on the code you provided, I would suggest using the tibble function to create your metadata table, rather than using the data.frame function. This is because tibble is a more modern implementation of data frames and is better suited for modern data analysis workflows.

I hope this helps! Let me know if you have any further questions.

Hi, I am having a similar issue. Any ideas what might be causing this issue?

When I try to run the following:

metadata<-data.frame(sample_data(phyloseq_merged_saline_indicators_deseq)) metadata$Salinity <- as.factor(metadata$Salinity)

results <- ggpicrust2(file = "/Users/gordoncuster/Desktop/Git_Projects/salinity_atriplex/data/Picrust/picrust_out_pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv", metadata = metadata, group = "Salinity", pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name", )

I get the following message and error:

Rows: 5198 Columns: 31── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): function dbl (30): S0.1, S0.10, S0.3, S0.5, S0.8, S10.2, S10.3, S10.7, S10.8, S10.9, S20.1, S20.2, S20.5, S20.8, S20.9, C0.2, C0.4, C0.6, C0.7, C0.8, C10.2, C10.3, C10.5, C10.7, C10.8, C20.2, C20.4,... Use spec() to retrieve the full column specification for this data. Specify the column types or set show_col_types = FALSE to quiet this message. Calculation may take a long time, please be patient. The kegg pathway with zero abundance in all the different samples has been removed. Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

My metadata looks as: Salinity Treatment soil S0.1 0 N S S0.10 0 N S S0.3 0 N S S0.5 0 N S S0.8 0 N S S10.2 10 Y S

My KO table looks as: function S0.1 S0.10 S0.3 S0.5 S0.8 S10.2 S10.3 S10.7 S10.8 S10.9 S20.1 S20.2 S20.5 S20.8 S20.9 C0.2 C0.4 C0.6 C0.7 C0.8 C10.2 C10.3 C10.5 1 K00001 570.62 712.50 579.38 769.00 448.88 817.62 1246.25 1359.62 593.75 941.00 3140.50 4610.00 3331.0 3605.62 2783.50 843.33 790 674.33 724.00 637.33 1159.33 886.33 686.00 2 K00003 3189.50 3790.88 3021.88

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/24#issuecomment-1542711494, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTRMUSP4EU6U4UYATPTXFPVTHANCNFSM6AAAAAAX26HZ4U . You are receiving this because you commented.Message ID: @.***>

gcuster1991 commented 1 year ago

Hi @cafferychen777, If I create the metadata file as a tibble, and then a column "SampleID" to indicate my sample names. However, now I receive the following error.

The kegg pathway with zero abundance in all the different samples has been removed.
0  features are filtered!
The filtered data has  30  samples and  62  features will be tested!
Fit linear models ...
Completed.
We are connecting to the KEGG database to get the latest results, please wait patiently.
Registered S3 method overwritten by 'httr':
  method         from  
  print.response rmutil
Warning: cannot xtfrm data framesWarning: ‘>’ not meaningful for factorsError in if (xi > xj) 1L else -1L : missing value where TRUE/FALSE needed
cafferychen777 commented 1 year ago

Hi @gcuster1991 ,

Regarding the error message you received, it seems that there might be an issue with the data or the workflow you are using. Without having access to the data or the specific details of your workflow, it's difficult for me to pinpoint the exact problem.

However, I suggest the following steps to troubleshoot the issue:

  1. Share data: If possible, could you share the data with me? Having access to the actual data would help me better understand and diagnose the problem.

  2. Try an alternative workflow: In case the issue persists, you can attempt an alternative workflow on the tutorial to determine if the problem lies within the current approach.

Additionally, if you can provide more details, I can offer more targeted assistance. Please let me know if there's anything else I can help you with.

Best regards, Chen YANG

gcuster1991 commented 1 year ago

Hi @cafferychen777, Thanks for your help on this. What would be the best way to share the data? Cheers, Gordon

cafferychen777 commented 1 year ago

Maybe you can upload it to Google Drive and send it to @.*** .

Gordon Custer @.***> 于2023年5月11日周四 20:43写道:

Hi @cafferychen777 https://github.com/cafferychen777, Thanks for your help on this. What would be the best way to share the data? Cheers, Gordon

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/24#issuecomment-1543929618, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTSVZGL7AGKHTVH5TQDXFTNFNANCNFSM6AAAAAAX26HZ4U . You are receiving this because you were mentioned.Message ID: @.***>

gcuster1991 commented 1 year ago

Your email address did not come through. It is hidden.

cafferychen777 commented 1 year ago

cafferychen7850atgmai.com

cafferychen777 commented 1 year ago
截屏2023-05-11 21 32 18 截屏2023-05-11 21 32 39 截屏2023-05-11 21 33 00

Hi @mbrose2022 ,

It works well on my macbook. You can use the following code.

library(readr)
#Update to the newest version 1.6.4
devtools::install_github('cafferychen777/ggpicrust2')
library(ggpicrust2)
library(tidyverse)
library(ggprism)
library(patchwork)
load("/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/Gordon/ggpicrust_troubleshooting.RData")

kegg_abundance <-
  ko2kegg_abundance(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/Gordon/pred_metagenome_unstrat_subset.tsv"
  )

metadata <- as.data.frame(as.matrix(phyloseq_merged_saline_indicators_deseq@sam_data)) %>% rownames_to_column("sample_name") %>% as_tibble()

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Salinity",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <-
  pathway_annotation(pathway = "KO",
                     daa_results_df = daa_sub_method_results_df,
                     ko_to_kegg = TRUE)

Group <-
  metadata$Salinity 

Group <- factor(Group)

daa_annotated_sub_method_results_df <- daa_annotated_sub_method_results_df[!is.na(daa_annotated_sub_method_results_df$pathway_name),]

low_p_feature <- daa_annotated_sub_method_results_df[order(daa_annotated_sub_method_results_df$p_adjust), ]$feature[1:29]
# select parameter format in pathway_error() is c("ko00562", "ko00440", "ko04111", "ko05412", "ko00310", "ko04146", "ko00600", "ko04142", "ko00604", "ko04260", "ko04110", "ko04976", "ko05222", "ko05416", "ko00380", "ko05322", "ko00625", "ko00624", "ko00626", "ko00621")

p <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = Group,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = low_p_feature,
    ko_to_kegg = TRUE,
    p_value_bar = FALSE,
    colors = NULL,
    x_lab = "pathway_name"
  )

pathway_pca(kegg_abundance,metadata, "Salinity")

sub_kegg_abundance <- kegg_abundance[rownames(kegg_abundance) %in% low_p_feature[1:10],]
pathway_heatmap(sub_kegg_abundance,metadata, "Salinity")

Best regards,

cafferychen777 commented 1 year ago

The version 1.6.4 comes across some problem. If you update it before 5 minutes ago, Please re-install it.

Best regards,

jsevereyn commented 11 months ago

Hello @cafferychen777 I followed your post of May 11 and it worked just the same

But Im wondering why on the pathway_errorbar plot the part that should show the fold-changes is missing and also my plotted p-values are altered (as high numbers)

V2

cafferychen777 commented 11 months ago

Hello @jsevereyn ,

For the fold-changes problem, you can set the p_value_bar parameter to TRUE; for the p values problem, you can round the p_value_adjust to 3 demicals such as round(, 3).

Best regards, Chen YANG