bernatgel / karyoploteR

karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
https://bernatgel.github.io/karyoploter_tutorial/
298 stars 42 forks source link

makeGenesDataFromTxDb did not pick up gene in some region #152

Open h20gg702 opened 6 months ago

h20gg702 commented 6 months ago

Hi Thank you for developing nice tools. I faced a problem in the makeGenesDataFromTxDb function. I found that plotKaryotype does not work for some region.

grW <- toGRanges("chr17:1715523-1739600") kp <- plotKaryotype(zoom = grW, cex=1, plot.type=2) genes.data <- makeGenesDataFromTxDb(TxDb.Hsapiens.UCSC.hg38.knownGene, karyoplot=kp, plot.transcripts = TRUE, plot.transcripts.structure = TRUE) But I couldn't see any genes just character(0) like below. genes.data[["genes"]]@ranges@NAMES character(0)

But when I used trackViewer package, I can see there is WDR81 gene in the region I indicated in "plotKaryotype". So TxDb.Hsapiens.UCSC.hg38.knownGene package is ok. Do you have any idea for this problem?

genes <- geneTrack("124997", TxDb.Hsapiens.UCSC.hg38.knownGene, "WDR81", asList=FALSE) genes@dat@ranges IRanges object with 47 ranges and 0 metadata columns: start end width

124997.WDR81 1716523 1716535 13 124997.WDR81 1716536 1716546 11 124997.WDR81 1716547 1716571 25 124997.WDR81 1716572 1716575 4 124997.WDR81 1716576 1716600 25 ... ... ... ... 124997.WDR81 1737686 1738488 803 124997.WDR81 1738489 1738584 96 124997.WDR81 1738585 1738585 1 124997.WDR81 1738586 1738594 9 124997.WDR81 1738595 1738599 5
GRealesM commented 5 months ago

Hi Bernat and maintainers,

I was about to open an issue with a similar problem. I'll post it here, hoping it helps. In my case, I'm trying to plot a region at the beginning of chr11 using this dataset. I noticed that important genes like IRF7 were missing. After looking at the UCSC browser, I realised the missing region corresponds perfectly to the bit where "chr11_KI270832v1_alt" is annotated. From this 8-year-old question in Bioconductor, I realised that the problem comes when trying to filter things that appear in more than one place.

I tried following their advice and use keepStandardChromosomes(TxDb.Hsapiens.UCSC.hg38.knownGene), and the final object recovers more genes, including the ones I missed. The problem is that the code to get the transcripts (eg. transcriptsBy()) doesn't recover the transcripts for the previously missing genes, which makes running their code to fail. I tried to simply apply this to the code suggested in the tutorial (below) but it also fails. tx <- keepStandardChromosomes(TxDb.Hsapiens.UCSC.hg38.knownGene) genes.data <- makeGenesDataFromTxDb(txdb = tx, karyoplot = kp) genes.data <- addGeneNames(genes.data) genes.data <- mergeTranscripts(genes.data).

Maybe this has a very easy solution, like setting a specific parameter, but I haven't found it yet. Again, any advice is appreciated.

Guillermo

====================================== `sessionInfo() R version 4.3.3 (2024-02-29) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Rocky Linux 8.9 (Green Obsidian)

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblaso-r0.3.15.so; LAPACK version 3.9.0

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

time zone: GB tzcode source: system (glibc)

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] org.Hs.eg.db_3.18.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0 GenomicFeatures_1.54.4 AnnotationDbi_1.64.1
[5] Biobase_2.62.0 karyoploteR_1.28.0 regioneR_1.34.0 GenomicRanges_1.54.1
[9] GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1
[13] magrittr_2.0.3 data.table_1.15.4

loaded via a namespace (and not attached): [1] DBI_1.2.2 bitops_1.0-7 gridExtra_2.3 biomaRt_2.58.2 rlang_1.1.3
[6] biovizBase_1.50.0 matrixStats_1.3.0 compiler_4.3.3 RSQLite_2.3.6 png_0.1-8
[11] vctrs_0.6.5 ProtGenerics_1.34.0 stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.2
[16] fastmap_1.2.0 backports_1.5.0 dbplyr_2.5.0 XVector_0.42.0 utf8_1.2.4
[21] Rsamtools_2.18.0 rmarkdown_2.27 bit_4.0.5 xfun_0.44 zlibbioc_1.48.2
[26] cachem_1.1.0 jsonlite_1.8.8 progress_1.2.3 blob_1.2.4 DelayedArray_0.28.0
[31] BiocParallel_1.36.0 parallel_4.3.3 prettyunits_1.2.0 cluster_2.1.6 R6_2.5.1
[36] VariantAnnotation_1.48.1 stringi_1.8.4 RColorBrewer_1.1-3 bezier_1.1.2 rtracklayer_1.62.0
[41] rpart_4.1.23 knitr_1.46 Rcpp_1.0.12 SummarizedExperiment_1.32.0 R.utils_2.12.3
[46] base64enc_0.1-3 Matrix_1.6-5 nnet_7.3-19 tidyselect_1.2.1 dichromat_2.0-0.1
[51] rstudioapi_0.16.0 abind_1.4-5 yaml_2.3.8 codetools_0.2-20 curl_5.2.1
[56] lattice_0.22-6 tibble_3.2.1 KEGGREST_1.42.0 evaluate_0.23 foreign_0.8-86
[61] BiocFileCache_2.10.2 xml2_1.3.6 Biostrings_2.70.3 pillar_1.9.0 filelock_1.0.3
[66] MatrixGenerics_1.14.0 checkmate_2.3.1 generics_0.1.3 RCurl_1.98-1.14 ensembldb_2.26.0
[71] hms_1.1.3 ggplot2_3.5.1 munsell_0.5.1 scales_1.3.0 glue_1.7.0
[76] lazyeval_0.2.2 Hmisc_5.1-2 tools_4.3.3 BiocIO_1.12.0 BSgenome_1.70.2
[81] GenomicAlignments_1.38.2 XML_3.99-0.16.1 grid_4.3.3 colorspace_2.1-0 GenomeInfoDbData_1.2.11
[86] htmlTable_2.4.2 restfulr_0.0.15 Formula_1.2-5 cli_3.6.2 rappdirs_0.3.3
[91] fansi_1.0.6 S4Arrays_1.2.1 dplyr_1.1.4 AnnotationFilter_1.26.0 gtable_0.3.5
[96] R.methodsS3_1.8.2 digest_0.6.35 SparseArray_1.2.4 rjson_0.2.21 htmlwidgets_1.6.4
[101] R.oo_1.26.0 memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7
[106] bit64_4.0.5 bamsignals_1.34.0 `

bernatgel commented 5 months ago

Hi @h20gg702 and Guillermo @GRealesM

Thanks for pointing this out and for the additional information provided by Guillermo.

It seems like a bug in karyoploteR, so I'll have to take a look at it.

I'll get back to you as soon as I have some more info

Bernat