read_gmt failed - Githubissues

masato-ogishi commented 4 years ago

Hi,

I installed the package from the Bioconductor and tried the example script.

However, the following script failed. gmt_fname <- system.file("extdata", "pathways.gmt", package="CEMiTool") gmt_in <- read_gmt(gmt_fname)

The error message said: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0 In addition: Warning message: In strsplit(x, split = "\t") : input string 1 is invalid in this locale

Can you kindly deal with this bug?

Best, Masato

pedrostrusso commented 4 years ago

Hi @masato-ogishi, thanks for trying CEMiTool. I was unable to reproduce your error. Could you please provide the output of the sessionInfo() function?

masato-ogishi commented 4 years ago

Hi @pedrostrusso

Thanks for the quick response. Session in fo is as follows.

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale: [1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932 LC_MONETARY=Japanese_Japan.932 [4] LC_NUMERIC=C LC_TIME=Japanese_Japan.932

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] CEMiTool_1.8.3

loaded via [1] fgsea_1.10.0 [5] qvalue_2.16.0 [9] rstudioapi_0.10 [13] DT_0.7 [17] xml2_1.2.1 [21] GOSemSim_2.10.0 [25] polyclip_1.10-0 [29] WGCNA_1.68 [33] ggforce_0.2.2 [37] rvcheck_0.1.3 [41] lazyeval_0.2.2 [45] prettyunits_1.0.2 [49] gtable_0.3.0 [53] dplyr_0.8.3 [57] enrichplot_1.4.0 [61] preprocessCore_1.46.0 [65] fastcluster_1.1.25 [69] DOSE_3.10.2 [73] scales_1.0.0 [77] memoise_1.1.0 [81] ggpmisc_0.3.1 [85] stringi_1.4.3 [89] foreach_1.4.7 [93] rlang_0.4.0 [97] pracma_2.2.5 [101] cowplot_1.0.0 [105] plyr_1.8.4 [109] Hmisc_4.2-0 [113] pillar_1.4.2 [117] tibble_2.1.3 [121] progress_1.2.2 [125] digest_0.6.20 [129] munsell_0.5.0 a namespace (and not attached): colorspace_1.4-1 ggridges_0.5.1 dynamicTreeCut_1.63-1 htmlTable_1.13.1 base64enc_0.1-3 ggdendro_0.1-20
farver_1.1.0 urltools_1.7.3 ggrepel_0.8.1
bit64_0.9-7 AnnotationDbi_1.46.0 mvtnorm_1.0-11
codetools_0.2-16 splines_3.6.1 doParallel_1.0.14
impute_1.58.0 robustbase_0.93-5 knitr_1.23
zeallot_0.1.0 Formula_1.2-3 jsonlite_1.6
cluster_2.1.0 GO.db_3.8.2 intergraph_2.0-2
rrcov_1.4-7 compiler_3.6.1 httr_1.4.0
backports_1.1.4 assertthat_0.2.1 Matrix_1.2-17
tweenr_1.0.1 acepack_1.4.1 htmltools_0.3.6
tools_3.6.1 igraph_1.2.4.1 coda_0.19-3
glue_1.3.1 reshape2_1.4.3 DO.db_2.9
ggthemes_4.2.0 fastmatch_1.1-0 Rcpp_1.0.2
Biobase_2.44.0 statnet.common_4.3.0 vctrs_0.2.0
iterators_1.0.12 ggraph_1.0.2 xfun_0.8
stringr_1.4.0 network_1.15 clusterProfiler_3.12.0 DEoptimR_1.0-8 europepmc_0.3 MASS_7.3-51.4
hms_0.5.0 parallel_3.6.1 RColorBrewer_1.1-2
gridExtra_2.3 ggplot2_3.2.0 UpSetR_1.4.0
triebeard_0.3.0 rpart_4.1-15 latticeExtra_0.6-28
RSQLite_2.1.2 S4Vectors_0.22.0 pcaPP_1.9-73
checkmate_1.9.4 BiocGenerics_0.30.0 BiocParallel_1.18.0
pkgconfig_2.0.2 matrixStats_0.54.0 evaluate_0.14
lattice_0.20-38 purrr_0.3.2 htmlwidgets_1.3
bit_1.1-14 tidyselect_0.2.5 robust_0.4-18.1
magrittr_1.5 R6_2.4.0 IRanges_2.18.1
fit.models_0.5-14 sna_2.4 DBI_1.0.0
foreign_0.8-71 survival_2.44-1.1 nnet_7.3-12
crayon_1.3.4 rmarkdown_1.14 viridis_0.5.1
grid_3.6.1 data.table_1.12.2 blob_1.2.0
tidyr_0.8.3 gridGraphics_0.4-1 stats4_3.6.1
viridisLite_0.3.0 ggplotify_0.0.3

masato-ogishi commented 4 years ago

Hi again,

I've inspected the code and modified it as follows. read_gmt_2 <- function(fname){ res <- list(genes = list(), desc = list()) gmt <- file(fname) gmt_lines <- readLines(gmt) close(gmt) gmt_list <- lapply(gmt_lines, function(x) unlist(strsplit(x, "\t", fixed=T))) gmt_names <- sapply(gmt_list, "[", 1) gmt_desc <- lapply(gmt_list, "[", 2) gmt_genes <- lapply(gmt_list, function(x) { x[3:length(x)] }) names(gmt_desc) <- names(gmt_genes) <- gmt_names res <- lapply(1:length(gmt_genes), function(i){ data.table::data.table(term=names(gmt_genes)[i], gene=gmt_genes[[i]]) }) res <- data.table::rbindlist(res) res$term <- factor(res$term, levels=unique(names(gmt_genes))) return(res) } This function generates the following data.frame. Is this data.frame correct? gmt_in.txt

Then I've run the cemitool function, which generated some additional warnings. The generated plots looked like the ones in the vignette. Are these warnings expected?

cem <- cemitool(expr0, annot=sample_annot, gmt=gmt_in, interactions=int_df,

filter=T, plot=T, verbose=T) Including sample annotation ... Plotting diagnostic plots ... ...Plotting mean and variance scatterplot ... ...Plotting expression histogram ... ...Plotting qq plot ... ...Plotting sample tree ... Finding modules ... Selecting Beta pickSoftThreshold: will use block size 257. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 257 of 257 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. Density Centralization 1 1 0.228 0.936 0.721 69.800 68.900 104.00 0.27300 0.1340 2 2 0.017 -0.131 0.585 31.500 31.000 61.50 0.12300 0.1180 3 3 0.392 -0.486 0.817 18.300 17.300 43.10 0.07130 0.0977 4 4 0.596 -0.625 0.868 12.100 10.900 32.50 0.04740 0.0804 5 5 0.730 -0.700 0.902 8.730 7.440 25.60 0.03410 0.0666 6 6 0.734 -0.745 0.902 6.600 5.730 21.00 0.02580 0.0568 7 7 0.778 -0.799 0.917 5.170 4.300 17.90 0.02020 0.0503 8 8 0.803 -0.864 0.892 4.150 3.380 15.50 0.01620 0.0448 9 9 0.791 -0.952 0.839 3.400 2.560 13.60 0.01330 0.0400 10 10 0.838 -1.000 0.879 2.820 2.020 11.90 0.01100 0.0359 11 12 0.894 -0.952 0.924 2.020 1.200 9.44 0.00789 0.0292 12 14 0.916 -0.999 0.914 1.500 0.806 7.59 0.00587 0.0240 13 16 0.924 -0.977 0.902 1.150 0.526 6.20 0.00450 0.0199 14 18 0.894 -0.981 0.866 0.904 0.381 5.12 0.00353 0.0166 15 20 0.902 -0.955 0.897 0.724 0.269 4.26 0.00283 0.0139 Heterogeneity 1 0.240 2 0.422 3 0.555 4 0.655 5 0.735 6 0.802 7 0.861 8 0.916 9 0.968 10 1.020 11 1.110 12 1.200 13 1.290 14 1.370 15 1.450 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..cutHeight not given, setting it to 0.995 ===> 99% of the (truncated) height range in dendro. ..done. Merging modules based on eigengene similarity mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... Plotting beta x R squared curve ... Plotting mean connectivity curve ... Including interactions ... Running Gene Set Enrichment Analysis ... Running GSEA Calculating modules enrichment analysis for class g0 Calculating modules enrichment analysis for class g3 Calculating modules enrichment analysis for class g7 Running over representation analysis ... Running ORA Using all genes in GMT file as universe. No gene set have size > 10 ... --> return NULL... No gene set have size > 10 ... --> return NULL... No gene set have size > 10 ... --> return NULL... No gene set have size > 10 ... --> return NULL... Generating profile plots ... Plotting Enrichment Scores ... Plotting interaction network ... Plotting beta x R squared curve ... Plotting mean connectivity curve ... Warning messages: 1: executing %dopar% sequentially: no parallel backend registered 2: In FUN(X[[i]], ...) : Enrichment for module M1 is NULL 3: In FUN(X[[i]], ...) : Enrichment for module M2 is NULL 4: In FUN(X[[i]], ...) : Enrichment for module M3 is NULL 5: In .local(cem, ...) : Enrichment is NULL. Either your gmt file is inadequate or your modules really aren't enriched for any of the pathways in the gmt file.

pedrostrusso commented 4 years ago

Hi @masato-ogishi, yes, that is the correct format for the file. The only thing missing is that your read_gmt_2 function is outputting a data.table, and CEMiTool expects a regular data.frame as input for the gmt argument.

Also, I noticed from your sessionInfo() output that your session is in Japanese. Maybe this is causing an encoding problem when you try to read from the file: Perhaps if you just changed file(fname) to file(fname, encoding="UTF-8") it would have worked as intended. Anyway, at least you got it to work for you :)

Please let me know if you have additional problems!

csbl-usp / CEMiTool

read_gmt failed #37