Closed biakemi closed 5 years ago
Hi @biakemi ,
Have you maybe tried remove.unusual=FALSE
? It looks like your chromosome names are not standard/typical
> library(genomation)
Loading required package: grid
> bed=readTableFast("CPI_annotation.bed.txt",header=FALSE,skip="auto")
> head(bed)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 NW_007282003.1 72352 91883 rna32495 . + 72352 91883 0 7
2 NW_007282003.1 72347 91883 rna32494 . + 72347 91883 0 6
3 NW_007282003.1 85319 91883 rna32497 . + 85319 91883 0 5
4 NW_007282003.1 72353 91883 rna32496 . + 72353 91883 0 8
5 NW_007282003.1 87211 91883 rna32498 . + 87211 91883 0 5
6 NW_007282003.1 72346 91883 rna32493 . + 72346 91883 0 6
V11 V12
1 82,65,5,78,757,39,564, 0,9257,13363,16279,16834,18805,18967,
2 87,75,78,757,39,564, 0,4248,16284,16839,18810,18972,
3 125,78,757,39,564, 0,3312,3867,5838,6000,
4 81,75,65,5,78,757,39,564, 0,4242,9256,13362,16278,16833,18804,18966,
5 39,78,757,39,564, 0,1420,1975,3946,4108,
6 88,65,78,757,39,564, 0,9263,16285,16840,18811,18973,
> tmp=readTranscriptFeatures("CPI_annotation.bed.txt",remove.unusual=FALSE)
Reading the table...
Calculating intron coordinates...
Calculating exon coordinates...
Calculating TSS coordinates...
Calculating promoter coordinates...
Outputting the final GRangesList...
> head(tmp)
GRangesList object of length 4:
$exons
GRanges object with 577702 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <numeric> <character>
[1] NW_007282003.1 72353-72434 + | 1 rna32495
[2] NW_007282003.1 81610-81674 + | 2 rna32495
[3] NW_007282003.1 85716-85720 + | 3 rna32495
[4] NW_007282003.1 88632-88709 + | 4 rna32495
[5] NW_007282003.1 89187-89943 + | 5 rna32495
... ... ... ... . ... ...
[577698] NW_007359896.1 3166108-3166257 + | 2 rna43523
[577699] NW_007359896.1 3168342-3168524 + | 3 rna43523
[577700] NW_007359896.1 3173217-3173369 + | 4 rna43523
[577701] NW_007359896.1 3175414-3175504 + | 5 rna43523
[577702] NW_007359896.1 3193974-3197617 + | 6 rna43523
...
<3 more elements>
-------
seqinfo: 2628 sequences from an unspecified genome; no seqlengths
Cheers, Kasia
That worked perfectly! I had tried using remove.unusual=FALSE
, but I believe my data at the time was not correct. Thank you!
no problem! ah then, maybe your chromosome names got replaces/removed after conversion from a gtf file to a bed12 file. Since it doesn't look like a genomation issue I close this issue now Kasia
Hi,
I have the same issue with my gtf2bed converted file. I don't know why my file is such different from biakemi's, but whatever using remove.unusual=FALSe dosen't solve the problem... Below, you can see the first four lines of the file. Maybe would you know what the problem is ?
CADCXH010000001.1 23532 23537 G32900 . - ROSLIN_INST three_prime_utr . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; CADCXH010000001.1 23532 23723 G32900 . - ROSLIN_INST exon . gene_id "G32900"; transcript_id "G32900.1"; exon_number "10"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; exon_id "G32900.1-E10"; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST gene . gene_id "G32900"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_id ""; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST transcript . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding";
Thanks for your help !
Regards, Thomas.
Hi,
I have the same issue with my gtf2bed converted file. I don't know why my file is such different from biakemi's, but whatever using remove.unusual=FALSe dosen't solve the problem... Below, you can see the first four lines of the file. Maybe would you know what the problem is ?
CADCXH010000001.1 23532 23537 G32900 . - ROSLIN_INST three_prime_utr . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; CADCXH010000001.1 23532 23723 G32900 . - ROSLIN_INST exon . gene_id "G32900"; transcript_id "G32900.1"; exon_number "10"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; exon_id "G32900.1-E10"; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST gene . gene_id "G32900"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_id ""; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST transcript . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding";
Thanks for your help !
Regards, Thomas.
I had the same issue but it turns out that gtf2bed was only converting it to a 10 column bed file. I used bedparse (https://bedparse.readthedocs.io/en/stable/Usage.html#convert-gtf-to-bed) to convert the gtf to a bed12 file and it worked.
Hi,
I'm trying to run my bed12 file as
readTranscriptFeatures(bed.file)
and I keep getting the error:
Error in rep(1:nrow(ref), ref[, 10]) : invalid 'times' argument
I converted my gtf file to bed12 using bedops. I tried looking online a solution, but all of them refer to the code, not to the input file. How could I solve this issue? My bed12 file is attached. CPI_annotation.bed.txt
Any help appreciated!
My session_info: `R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.5
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] rtracklayer_1.44.2 genomation_1.16.0 methylKit_1.11.0 GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 [6] IRanges_2.18.1 S4Vectors_0.22.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached): [1] Biobase_2.44.0 splines_3.6.0 R.utils_2.9.0 gtools_3.8.1
[5] assertthat_0.2.1 BiocManager_1.30.4 BSgenome_1.52.0 GenomeInfoDbData_1.2.1
[9] Rsamtools_2.0.0 impute_1.58.0 numDeriv_2016.8-1.1 pillar_1.4.2
[13] backports_1.1.4 lattice_0.20-38 glue_1.3.1 limma_3.40.6
[17] bbmle_1.0.20 XVector_0.24.0 qvalue_2.16.0 colorspace_1.4-1
[21] Matrix_1.2-17 R.oo_1.22.0 plyr_1.8.4 XML_3.98-1.20
[25] pkgconfig_2.0.2 emdbook_1.3.11 zlibbioc_1.30.0 purrr_0.3.2
[29] scales_1.0.0 BiocParallel_1.18.0 tibble_2.1.3 mgcv_1.8-28
[33] ggplot2_3.2.0 seqPattern_1.16.0 SummarizedExperiment_1.14.1 lazyeval_0.2.2
[37] magrittr_1.5 crayon_1.3.4 mclust_5.4.5 R.methodsS3_1.7.1
[41] nlme_3.1-141 MASS_7.3-51.4 tools_3.6.0 data.table_1.12.2
[45] hms_0.5.0 matrixStats_0.54.0 gridBase_0.4-7 stringr_1.4.0
[49] munsell_0.5.0 plotrix_3.7-6 DelayedArray_0.10.0 Biostrings_2.52.0
[53] compiler_3.6.0 fastseg_1.30.0 rlang_0.4.0 RCurl_1.95-4.12
[57] rstudioapi_0.10 bitops_1.0-6 gtable_0.3.0 reshape2_1.4.3
[61] R6_2.4.0 GenomicAlignments_1.20.1 dplyr_0.8.3 zeallot_0.1.0
[65] KernSmooth_2.23-15 readr_1.3.1 stringi_1.4.3 Rcpp_1.0.2 `