Closed jambler24 closed 2 years ago
I only see a warning here. Is that what is bothering you?
Other than that, seems to be working just fine:
library(GenomicFeatures)
txdb <- makeTxDbFromGFF("GCA_000195955.2_ASM19595v2_genomic.gff.gz")
# Import genomic features from the file as a GRanges object ... OK
# Prepare the 'metadata' data frame ... OK
# Make the TxDb object ... OK
# Warning message:
# In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
# The following transcripts have multiple parts that were merged:
# gene-Rv3216
txdb
# TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: GCA_000195955.2_ASM19595v2_genomic.gff.gz
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 4111
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-02-04 10:25:43 -0800 (Thu, 04 Feb 2021)
# GenomicFeatures version at creation time: 1.42.1
# RSQLite version at creation time: 2.2.3
# DBSCHEMAVERSION: 1.2
head(transcriptLengths(txdb))
# tx_id tx_name gene_id nexon tx_len
# 1 1 dnaA dnaA 1 1524
# 2 2 dnaN dnaN 1 1209
# 3 3 recF recF 1 1158
# 4 4 Rv0004 Rv0004 1 564
# 5 5 gyrB gyrB 1 2028
# 6 6 gyrA gyrA 1 2517
If it didn't work for you, please share the details.
Thanks, H.
sessionInfo():
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.10
Matrix products: default
BLAS: /home/hpages/R/R-4.0.3/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.3/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicFeatures_1.42.1 AnnotationDbi_1.52.0 Biobase_2.50.0
[4] GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.1
[7] S4Vectors_0.28.1 BiocGenerics_0.36.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lattice_0.20-41
[3] prettyunits_1.1.1 Rsamtools_2.6.0
[5] Biostrings_2.58.0 assertthat_0.2.1
[7] BiocFileCache_1.14.0 R6_2.5.0
[9] RSQLite_2.2.3 httr_1.4.2
[11] pillar_1.4.7 zlibbioc_1.36.0
[13] rlang_0.4.10 progress_1.2.2
[15] curl_4.3 rstudioapi_0.13
[17] blob_1.2.1 Matrix_1.3-2
[19] BiocParallel_1.24.1 stringr_1.4.0
[21] RCurl_1.98-1.2 bit_4.0.4
[23] biomaRt_2.46.2 DelayedArray_0.16.1
[25] compiler_4.0.3 rtracklayer_1.50.0
[27] pkgconfig_2.0.3 askpass_1.1
[29] openssl_1.4.3 tidyselect_1.1.0
[31] SummarizedExperiment_1.20.0 tibble_3.0.6
[33] GenomeInfoDbData_1.2.4 matrixStats_0.58.0
[35] XML_3.99-0.5 crayon_1.4.0
[37] dplyr_1.0.4 dbplyr_2.1.0
[39] GenomicAlignments_1.26.0 bitops_1.0-6
[41] rappdirs_0.3.3 grid_4.0.3
[43] lifecycle_0.2.0 DBI_1.1.1
[45] magrittr_2.0.1 stringi_1.5.3
[47] cachem_1.0.2 XVector_0.30.0
[49] xml2_1.3.2 ellipsis_0.3.1
[51] generics_0.1.0 vctrs_0.3.6
[53] tools_4.0.3 bit64_4.0.5
[55] glue_1.4.2 purrr_0.3.4
[57] hms_1.0.0 MatrixGenerics_1.2.1
[59] fastmap_1.1.0 memoise_2.0.0
When trying to import a gff for Mycobacterium tuberculosis from the NCBI, the following error is received:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, : The following transcripts have multiple parts that were merged: gene-Rv3216
Code:
txdb <-makeTxDbFromGFF("/path/to/file/GCA_000195955.2_ASM19595v2_genomic.gff", organism="Mycobacterium tuberculosis")
Link to annotation and genome:
https://www.ncbi.nlm.nih.gov/assembly/GCF_000195955.2/