Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

`Error in startsWith(path, "https://gwas.mrcieu.ac.uk")` #53

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

I had a check built into read_sumstats to see if the path was a remote URL. But now that this func can take in data.tables directly, will need to adjust this to avoid an error.

sumstats <- MungeSumstats::read_sumstats(path = sumstats)
Reading header.
Error in startsWith(path, "https://gwas.mrcieu.ac.uk") : 
  non-character object(s)

Session Info

``` R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 [2] VariantAnnotation_1.39.0 [3] Rsamtools_2.9.1 [4] SummarizedExperiment_1.23.1 [5] MatrixGenerics_1.5.3 [6] matrixStats_0.60.0 [7] snpStats_1.43.1 [8] Matrix_1.3-4 [9] survival_3.2-12 [10] martini_1.13.0 [11] org.Hs.eg.db_3.13.0 [12] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [13] GenomicFeatures_1.45.1 [14] AnnotationDbi_1.55.1 [15] Biobase_2.53.0 [16] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 [17] BSgenome_1.61.0 [18] rtracklayer_1.53.1 [19] Biostrings_2.61.2 [20] XVector_0.33.0 [21] GenomicRanges_1.45.0 [22] GenomeInfoDb_1.29.3 [23] IRanges_2.27.0 [24] S4Vectors_0.31.0 [25] BiocGenerics_0.39.1 [26] dplyr_1.0.7 [27] ggplot2_3.3.5 [28] MungeSumstats_1.1.14 [29] MAGMA.Celltyping_2.0.0 [30] phenomix_0.2.0 loaded via a namespace (and not attached): [1] utf8_1.2.2 [2] R.utils_2.10.1 [3] tidyselect_1.1.1 [4] lme4_1.1-27.1 [5] RSQLite_2.2.7 [6] htmlwidgets_1.5.3 [7] grid_4.1.0 [8] BiocParallel_1.27.3 [9] munsell_0.5.0 [10] codetools_0.2-18 [11] future_1.21.0 [12] withr_2.4.2 [13] colorspace_2.0-2 [14] OrganismDbi_1.35.0 [15] filelock_1.0.2 [16] knitr_1.33 [17] rstudioapi_0.13 [18] listenv_0.8.0 [19] GenomeInfoDbData_1.2.6 [20] bit64_4.0.5 [21] parallelly_1.27.0 [22] vctrs_0.3.8 [23] generics_0.1.0 [24] xfun_0.25 [25] BiocFileCache_2.1.1 [26] R6_2.5.0 [27] doParallel_1.0.16 [28] AnnotationFilter_1.17.1 [29] bitops_1.0-7 [30] cachem_1.0.5 [31] DelayedArray_0.19.1 [32] assertthat_0.2.1 [33] promises_1.2.0.1 [34] BiocIO_1.3.0 [35] scales_1.1.1 [36] gtable_0.3.0 [37] globals_0.14.0 [38] ensembldb_2.17.4 [39] rlang_0.4.11 [40] splines_4.1.0 [41] lazyeval_0.2.2 [42] gargle_1.2.0 [43] broom_0.7.9 [44] BiocManager_1.30.16 [45] yaml_2.2.1 [46] reshape2_1.4.4 [47] backports_1.2.1 [48] httpuv_1.6.1 [49] RBGL_1.69.0 [50] tools_4.1.0 [51] usethis_2.0.1 [52] ellipsis_0.3.2 [53] gplots_3.1.1 [54] RColorBrewer_1.1-2 [55] ggdendro_0.1.22 [56] Rcpp_1.0.7 [57] plyr_1.8.6 [58] progress_1.2.2 [59] zlibbioc_1.39.0 [60] purrr_0.3.4 [61] RCurl_1.98-1.4 [62] prettyunits_1.1.1 [63] cowplot_1.1.1 [64] fs_1.5.0 [65] variancePartition_1.23.1 [66] magrittr_2.0.1 [67] data.table_1.14.0 [68] ProtGenerics_1.25.1 [69] hms_1.1.0 [70] mime_0.11 [71] evaluate_0.14 [72] xtable_1.8-4 [73] pbkrtest_0.5.1 [74] XML_3.99-0.7 [75] EWCE_1.1.1 [76] gridExtra_2.3 [77] compiler_4.1.0 [78] biomaRt_2.49.4 [79] tibble_3.1.3 [80] KernSmooth_2.23-20 [81] crayon_1.4.1 [82] minqa_1.2.4 [83] R.oo_1.24.0 [84] htmltools_0.5.1.1 [85] later_1.2.0 [86] tidyr_1.1.3 [87] DBI_1.1.1 [88] ExperimentHub_2.1.4 [89] gprofiler2_0.2.0 [90] dbplyr_2.1.1 [91] MASS_7.3-54 [92] rappdirs_0.3.3 [93] boot_1.3-28 [94] BiocStyle_2.21.3 [95] EnsDb.Hsapiens.v75_2.99.0 [96] cli_3.0.1 [97] R.methodsS3_1.8.1 [98] igraph_1.2.6 [99] parallel_4.1.0 [100] pkgconfig_2.0.3 [101] SNPlocs.Hsapiens.dbSNP144.GRCh38_0.99.20 [102] GenomicAlignments_1.29.0 [103] One2One_0.1.1 [104] plotly_4.9.4.9000 [105] xml2_1.3.2 [106] foreach_1.5.1 [107] GeneOverlap_1.29.0 [108] stringr_1.4.0 [109] digest_0.6.27 [110] sctransform_0.3.2 [111] graph_1.71.2 [112] rmarkdown_2.10 [113] HGNChelper_0.8.1 [114] restfulr_0.0.13 [115] curl_4.3.2 [116] shiny_1.6.0 [117] gtools_3.9.2 [118] rjson_0.2.20 [119] nloptr_1.2.2.2 [120] lifecycle_1.0.0 [121] nlme_3.1-152 [122] jsonlite_1.7.2 [123] viridisLite_0.4.0 [124] limma_3.49.4 [125] fansi_0.5.0 [126] pillar_1.6.2 [127] lattice_0.20-44 [128] homologene_1.4.68.19.3.27 [129] KEGGREST_1.33.0 [130] fastmap_1.1.0 [131] httr_1.4.2 [132] googleAuthR_1.4.0 [133] interactiveDisplayBase_1.31.2 [134] glue_1.4.2 [135] RNOmni_1.0.0 [136] png_0.1-7 [137] iterators_1.0.13 [138] ewceData_1.1.0 [139] BiocVersion_3.14.0 [140] bit_4.0.4 [141] stringi_1.7.3 [142] blob_1.2.2 [143] AnnotationHub_3.1.5 [144] caTools_1.18.2 [145] memoise_2.0.0 [146] future.apply_1.8.1 ```
Al-Murphy commented 3 years ago

Ah that makes sense, I only adjusted the format_sumstats function, I'll update now

Al-Murphy commented 3 years ago

Actually what's the use case here? read_sumstats only reads in the data to R so essentially could just not call this function if a dataframe/datatable is passed. can you give an example of where the issue would happen so I can correct?

Al-Murphy commented 3 years ago

I've just added a check to read_sumstats anyway that just returns the summary statistics if they were already in memory as a dataframe/data.table object just to be safe. Again, I'll push this change today:

> sumstats <- data.table::fread(system.file("extdata","eduAttainOkbay.txt",
+                                         package="MungeSumstats"))
> sumstats <- MungeSumstats::read_sumstats(path = sumstats)
Summary statistics passed as R object.