Closed laurahspencer closed 1 year ago
Hi @laurahspencer,
Thanks for reporting your issue. Unfortunately, I am not able to reproduce your issue on my machine ( see attached output ).
Could you please give me more information about your session, by posting the sessionInfo()
? Maybe you are using an outdated version of R or methylKit which is causing the issue.
Best, Alex
library(methylKit)
load(file = "~/R-dev/methylkit-dev/issue283/meth_10x")
getSampleID(meth_10x)
#> [1] "1" "2" "3" "4" "7" "8" "10" "11" "12" "15" "17" "18" "20"
getSampleID(unite(meth_10x))
#> uniting...
#> [1] "1" "2" "3" "4" "7" "8" "10" "11" "12" "15" "17" "18" "20"
getSampleID(unite(meth_10x, destrand = TRUE))
#> destranding...
#> uniting...
#> [1] "1" "2" "3" "4" "7" "8" "10" "11" "12" "15" "17" "18" "20"
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
#> [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
#> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] methylKit_1.22.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
#> [4] IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0
#>
#> loaded via a namespace (and not attached):
#> [1] MatrixGenerics_1.8.1 Biobase_2.56.0
#> [3] splines_4.2.0 R.utils_2.12.0
#> [5] gtools_3.9.3 assertthat_0.2.1
#> [7] highr_0.9 GenomeInfoDbData_1.2.8
#> [9] Rsamtools_2.12.0 yaml_2.3.5
#> [11] numDeriv_2016.8-1.1 pillar_1.7.0
#> [13] lattice_0.20-45 glue_1.6.2
#> [15] limma_3.52.2 bbmle_1.0.25
#> [17] digest_0.6.29 XVector_0.36.0
#> [19] qvalue_2.28.0 colorspace_2.0-3
#> [21] htmltools_0.5.2 Matrix_1.4-1
#> [23] R.oo_1.25.0 plyr_1.8.7
#> [25] XML_3.99-0.10 pkgconfig_2.0.3
#> [27] emdbook_1.3.12 zlibbioc_1.42.0
#> [29] purrr_0.3.4 mvtnorm_1.1-3
#> [31] scales_1.2.0 BiocParallel_1.30.3
#> [33] tibble_3.1.7 styler_1.8.1
#> [35] generics_0.1.3 ggplot2_3.4.0
#> [37] ellipsis_0.3.2 SummarizedExperiment_1.26.1
#> [39] withr_2.5.0 cli_3.4.1
#> [41] magrittr_2.0.3 crayon_1.5.2
#> [43] mclust_5.4.10 evaluate_0.15
#> [45] R.methodsS3_1.8.2 fs_1.5.2
#> [47] fansi_1.0.3 R.cache_0.16.0
#> [49] MASS_7.3-56 tools_4.2.0
#> [51] data.table_1.14.2 matrixStats_0.62.0
#> [53] BiocIO_1.6.0 lifecycle_1.0.3
#> [55] stringr_1.4.0 munsell_0.5.0
#> [57] reprex_2.0.2 DelayedArray_0.22.0
#> [59] Biostrings_2.64.0 compiler_4.2.0
#> [61] fastseg_1.42.0 rlang_1.0.6
#> [63] grid_4.2.0 RCurl_1.98-1.7
#> [65] rstudioapi_0.14 rjson_0.2.21
#> [67] bitops_1.0-7 rmarkdown_2.14
#> [69] restfulr_0.0.15 gtable_0.3.0
#> [71] codetools_0.2-18 DBI_1.1.3
#> [73] reshape2_1.4.4 R6_2.5.1
#> [75] GenomicAlignments_1.32.1 knitr_1.39
#> [77] dplyr_1.0.9 rtracklayer_1.56.1
#> [79] bdsmatrix_1.3-6 fastmap_1.1.0
#> [81] utf8_1.2.2 stringi_1.7.8
#> [83] parallel_4.2.0 Rcpp_1.0.9
#> [85] vctrs_0.5.1 tidyselect_1.2.0
#> [87] xfun_0.36 coda_0.19-4
Created on 2023-05-16 with reprex v2.0.2
Interesting! Here is my session info:
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] clipr_0.8.0 BiocManager_1.30.19 ggrepel_0.9.2 googlesheets4_1.0.1 janitor_2.1.0
[6] corrplot_0.92 PerformanceAnalytics_2.0.4 xts_0.12.2 zoo_1.8-11 factoextra_1.0.7
[11] vegan_2.6-4 lattice_0.20-45 permute_0.9-7 viridis_0.6.2 viridisLite_0.4.1
[16] Pstat_1.2 matrixStats_0.62.0 ggforce_0.4.1 methylKit_1.24.0 GenomicRanges_1.49.0
[21] GenomeInfoDb_1.34.2 IRanges_2.32.0 S4Vectors_0.36.0 BiocGenerics_0.44.0 here_1.0.1
[26] reshape2_1.4.4 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.5
[31] readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] Rsamtools_2.14.0 rprojroot_2.0.3 crayon_1.5.2 MASS_7.3-58.1 nlme_3.1-160
[6] backports_1.4.1 reprex_2.0.2 rlang_1.0.6 XVector_0.38.0 readxl_1.4.1
[11] limma_3.54.0 BiocParallel_1.32.1 rjson_0.2.21 bit64_4.0.5 glue_1.6.2
[16] parallel_4.2.2 haven_2.5.1 tidyselect_1.2.0 SummarizedExperiment_1.28.0 XML_3.99-0.12
[21] GenomicAlignments_1.34.0 xtable_1.8-4 magrittr_2.0.3 evaluate_0.18 cli_3.4.1
[26] zlibbioc_1.44.0 rstudioapi_0.14 xfun_0.34 askpass_1.1 cluster_2.1.4
[31] Biostrings_2.66.0 withr_2.5.0 bitops_1.0-7 plyr_1.8.7 cellranger_1.1.0
[36] coda_0.19-4 pillar_1.8.1 fs_1.5.2 scatterplot3d_0.3-42 vctrs_0.5.0
[41] ellipsis_0.3.2 generics_0.1.3 tools_4.2.2 munsell_0.5.0 tweenr_2.0.2
[46] emmeans_1.8.2 DelayedArray_0.23.2 fastmap_1.1.0 compiler_4.2.2 abind_1.4-5
[51] rtracklayer_1.58.0 GenomeInfoDbData_1.2.9 gridExtra_2.3 utf8_1.2.2 jsonlite_1.8.3
[56] scales_1.2.1 carData_3.0-5 estimability_1.4.1 car_3.1-1 R.utils_2.12.2
[61] rmarkdown_2.17 Biobase_2.58.0 numDeriv_2016.8-1.1 yaml_2.3.6 htmltools_0.5.3
[66] BiocIO_1.8.0 quadprog_1.5-8 digest_0.6.30 assertthat_0.2.1 leaps_3.1
[71] rappdirs_0.3.3 emdbook_1.3.12 data.table_1.14.4 R.oo_1.25.0 splines_4.2.2
[76] labeling_0.4.2 googledrive_2.0.0 RCurl_1.98-1.9 broom_1.0.1 hms_1.1.2
[81] modelr_0.1.9 colorspace_2.0-3 Rcpp_1.0.9 mclust_6.0.0 mvtnorm_1.1-3
[86] multcompView_0.1-8 fansi_1.0.3 tzdb_0.3.0 R6_2.5.1 grid_4.2.2
[91] lifecycle_1.0.3 curl_4.3.3 snakecase_0.11.0 Matrix_1.5-1 qvalue_2.30.0
[96] fastseg_1.44.0 polyclip_1.10-4 timechange_0.1.1 rvest_1.0.3 mgcv_1.8-41
[101] openssl_2.0.4 flashClust_1.01-2 bdsmatrix_1.3-6 codetools_0.2-18 lubridate_1.9.0
[106] gtools_3.9.3 dbplyr_2.2.1 R.methodsS3_1.8.2 gtable_0.3.1 DBI_1.1.3
[111] httr_1.4.4 KernSmooth_2.23-20 stringi_1.7.8 vroom_1.6.0 farver_2.1.1
[116] xml2_1.3.3 bbmle_1.0.25 restfulr_0.0.15 bit_4.0.4 MatrixGenerics_1.10.0
[121] pkgconfig_2.0.3 gargle_1.2.1 knitr_1.40
When I run getSampleID()
on the united object meth_unite_10x
it provides the correct sample IDs, but the actual column names are incorrect:
colnames(meth_unite_10x)
[1] "chr" "start" "end" "strand" "coverage1" "numCs1" "numTs1" "coverage2" "numCs2" "numTs2" "coverage3"
[12] "numCs3" "numTs3" "coverage4" "numCs4" "numTs4" "coverage5" "numCs5" "numTs5" "coverage6" "numCs6" "numTs6"
[23] "coverage7" "numCs7" "numTs7" "coverage8" "numCs8" "numTs8" "coverage9" "numCs9" "numTs9" "coverage10" "numCs10"
[34] "numTs10" "coverage11" "numCs11" "numTs11" "coverage12" "numCs12" "numTs12" "coverage13" "numCs13" "numTs13"
And similarly,
colnames(unite(meth_10x, destrand = TRUE))
destranding...
uniting...
[1] "chr" "start" "end" "strand" "coverage1" "numCs1" "numTs1" "coverage2" "numCs2" "numTs2" "coverage3"
[12] "numCs3" "numTs3" "coverage4" "numCs4" "numTs4" "coverage5" "numCs5" "numTs5" "coverage6" "numCs6" "numTs6"
[23] "coverage7" "numCs7" "numTs7" "coverage8" "numCs8" "numTs8" "coverage9" "numCs9" "numTs9" "coverage10" "numCs10"
[34] "numTs10" "coverage11" "numCs11" "numTs11" "coverage12" "numCs12" "numTs12" "coverage13" "numCs13" "numTs13"
Ah okay, now I got what you were referring to.
These are actually the column names of the internal dataframe. They are not generated based on the given sample ids, nor the treatment vector. They are simply referring to the n-th sample in the object, or the n-th position of the sample-id vector.
I hope this clears things up.
Best, Alex
Laura H Spencer @.***> schrieb am Di., 16. Mai 2023, 20:56:
And similarly,
colnames(unite(meth_10x, destrand = TRUE))
destranding... uniting... [1] "chr" "start" "end" "strand" "coverage1" "numCs1" "numTs1" "coverage2" "numCs2" "numTs2" "coverage3" [12] "numCs3" "numTs3" "coverage4" "numCs4" "numTs4" "coverage5" "numCs5" "numTs5" "coverage6" "numCs6" "numTs6" [23] "coverage7" "numCs7" "numTs7" "coverage8" "numCs8" "numTs8" "coverage9" "numCs9" "numTs9" "coverage10" "numCs10" [34] "numTs10" "coverage11" "numCs11" "numTs11" "coverage12" "numCs12" "numTs12" "coverage13" "numCs13" "numTs13"
— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/283#issuecomment-1550197079, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD2NVLA3AGAFMGVJ3H3XGPEUXANCNFSM6AAAAAAYCT7DYE . You are receiving this because you commented.Message ID: @.***>
That makes sense- so when I run getData(meth_unite_10x)
the resulting dataframe's columns containing the methylation data (coverage#, numCs#, numTs#) can safely be renamed using the vector of sampleIDs that I extract using getSampleID(meth_unite_10x)
? Or is there a methylKit function that does that for me that I've overlooked?
Yes, this should work. Currently the only helper that we provide would be percMethylation() to extract methylation per region per sample.
Laura H Spencer @.***> schrieb am Mi., 17. Mai 2023, 23:06:
That makes sense- so when I run getData(meth_unite_10x) the resulting dataframe's columns containing the methylation data (coverage#, numCs#, numTs#) can safely be renamed using the vector of sampleIDs that I extract using getSampleID(meth_unite_10x)? Or is there a methylKit function that does that for me that I've overlooked?
— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/283#issuecomment-1552084464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JDZEPA6MBH6YN7K5T2DXGU4WJANCNFSM6AAAAAAYCT7DYE . You are receiving this because you commented.Message ID: @.***>
When running the unite() function in methylKit, the sample IDs specified in the methylRaw object are not correctly being assigned. I have 13 samples, with the IDs 1, 2, 3, 4, 7, 8, 10, 11, 12, 15, 17, 18, 20 (i.e. not all consecutive numbers). When I run unite() the resulting columns of data in the methylBase object contain sample IDs #1-13. Thinking that unite() renamed them consecutively I tried to spot-check the data to see if I can confidently rename the columns using my original IDs, but the data doesn't line up (possibly because I've destranded, but I can't be sure). How can I make sure that the correct sample IDs are transferred over to the methylBase object?
Here is a directory containing the methylRaw object
meth_10x
, and using the codemethylKit::unite(meth_10x, destrand=TRUE)
I created the methylBase objectmeth_unite_10x
, which has the incorrect sample IDs.