BIMSBbioinfo / genomation

R package for genomic feature analysis and visualization
http://bioinformatics.mdc-berlin.de/genomation/
73 stars 22 forks source link

readTranscriptFeatures - invalid times argument #188

Closed biakemi closed 5 years ago

biakemi commented 5 years ago

Hi,

I'm trying to run my bed12 file as

readTranscriptFeatures(bed.file)

and I keep getting the error:

Error in rep(1:nrow(ref), ref[, 10]) : invalid 'times' argument

I converted my gtf file to bed12 using bedops. I tried looking online a solution, but all of them refer to the code, not to the input file. How could I solve this issue? My bed12 file is attached. CPI_annotation.bed.txt

Any help appreciated!

My session_info: `R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.5

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] rtracklayer_1.44.2 genomation_1.16.0 methylKit_1.11.0 GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 [6] IRanges_2.18.1 S4Vectors_0.22.0 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] Biobase_2.44.0 splines_3.6.0 R.utils_2.9.0 gtools_3.8.1
[5] assertthat_0.2.1 BiocManager_1.30.4 BSgenome_1.52.0 GenomeInfoDbData_1.2.1
[9] Rsamtools_2.0.0 impute_1.58.0 numDeriv_2016.8-1.1 pillar_1.4.2
[13] backports_1.1.4 lattice_0.20-38 glue_1.3.1 limma_3.40.6
[17] bbmle_1.0.20 XVector_0.24.0 qvalue_2.16.0 colorspace_1.4-1
[21] Matrix_1.2-17 R.oo_1.22.0 plyr_1.8.4 XML_3.98-1.20
[25] pkgconfig_2.0.2 emdbook_1.3.11 zlibbioc_1.30.0 purrr_0.3.2
[29] scales_1.0.0 BiocParallel_1.18.0 tibble_2.1.3 mgcv_1.8-28
[33] ggplot2_3.2.0 seqPattern_1.16.0 SummarizedExperiment_1.14.1 lazyeval_0.2.2
[37] magrittr_1.5 crayon_1.3.4 mclust_5.4.5 R.methodsS3_1.7.1
[41] nlme_3.1-141 MASS_7.3-51.4 tools_3.6.0 data.table_1.12.2
[45] hms_0.5.0 matrixStats_0.54.0 gridBase_0.4-7 stringr_1.4.0
[49] munsell_0.5.0 plotrix_3.7-6 DelayedArray_0.10.0 Biostrings_2.52.0
[53] compiler_3.6.0 fastseg_1.30.0 rlang_0.4.0 RCurl_1.95-4.12
[57] rstudioapi_0.10 bitops_1.0-6 gtable_0.3.0 reshape2_1.4.3
[61] R6_2.4.0 GenomicAlignments_1.20.1 dplyr_0.8.3 zeallot_0.1.0
[65] KernSmooth_2.23-15 readr_1.3.1 stringi_1.4.3 Rcpp_1.0.2 `

katwre commented 5 years ago

Hi @biakemi , Have you maybe tried remove.unusual=FALSE? It looks like your chromosome names are not standard/typical

> library(genomation)
Loading required package: grid
> bed=readTableFast("CPI_annotation.bed.txt",header=FALSE,skip="auto")
> head(bed)
              V1    V2    V3       V4 V5 V6    V7    V8 V9 V10
1 NW_007282003.1 72352 91883 rna32495  .  + 72352 91883  0   7
2 NW_007282003.1 72347 91883 rna32494  .  + 72347 91883  0   6
3 NW_007282003.1 85319 91883 rna32497  .  + 85319 91883  0   5
4 NW_007282003.1 72353 91883 rna32496  .  + 72353 91883  0   8
5 NW_007282003.1 87211 91883 rna32498  .  + 87211 91883  0   5
6 NW_007282003.1 72346 91883 rna32493  .  + 72346 91883  0   6
                        V11                                        V12
1    82,65,5,78,757,39,564,      0,9257,13363,16279,16834,18805,18967,
2      87,75,78,757,39,564,            0,4248,16284,16839,18810,18972,
3        125,78,757,39,564,                     0,3312,3867,5838,6000,
4 81,75,65,5,78,757,39,564, 0,4242,9256,13362,16278,16833,18804,18966,
5         39,78,757,39,564,                     0,1420,1975,3946,4108,
6      88,65,78,757,39,564,            0,9263,16285,16840,18811,18973,
> tmp=readTranscriptFeatures("CPI_annotation.bed.txt",remove.unusual=FALSE)
Reading the table...
Calculating intron coordinates...
Calculating exon coordinates...
Calculating TSS coordinates...
Calculating promoter coordinates...
Outputting the final GRangesList...

> head(tmp)
GRangesList object of length 4:
$exons
GRanges object with 577702 ranges and 2 metadata columns:
                 seqnames          ranges strand |     score        name
                    <Rle>       <IRanges>  <Rle> | <numeric> <character>
       [1] NW_007282003.1     72353-72434      + |         1    rna32495
       [2] NW_007282003.1     81610-81674      + |         2    rna32495
       [3] NW_007282003.1     85716-85720      + |         3    rna32495
       [4] NW_007282003.1     88632-88709      + |         4    rna32495
       [5] NW_007282003.1     89187-89943      + |         5    rna32495
       ...            ...             ...    ... .       ...         ...
  [577698] NW_007359896.1 3166108-3166257      + |         2    rna43523
  [577699] NW_007359896.1 3168342-3168524      + |         3    rna43523
  [577700] NW_007359896.1 3173217-3173369      + |         4    rna43523
  [577701] NW_007359896.1 3175414-3175504      + |         5    rna43523
  [577702] NW_007359896.1 3193974-3197617      + |         6    rna43523

...
<3 more elements>
-------
seqinfo: 2628 sequences from an unspecified genome; no seqlengths

Cheers, Kasia

biakemi commented 5 years ago

That worked perfectly! I had tried using remove.unusual=FALSE, but I believe my data at the time was not correct. Thank you!

katwre commented 5 years ago

no problem! ah then, maybe your chromosome names got replaces/removed after conversion from a gtf file to a bed12 file. Since it doesn't look like a genomation issue I close this issue now Kasia

TSolDour commented 2 years ago

Hi,

I have the same issue with my gtf2bed converted file. I don't know why my file is such different from biakemi's, but whatever using remove.unusual=FALSe dosen't solve the problem... Below, you can see the first four lines of the file. Maybe would you know what the problem is ?


CADCXH010000001.1 23532 23537 G32900 . - ROSLIN_INST three_prime_utr . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; CADCXH010000001.1 23532 23723 G32900 . - ROSLIN_INST exon . gene_id "G32900"; transcript_id "G32900.1"; exon_number "10"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; exon_id "G32900.1-E10"; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST gene . gene_id "G32900"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_id ""; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST transcript . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding";


Thanks for your help !

Regards, Thomas.

cmacphillamy commented 2 years ago

Hi,

I have the same issue with my gtf2bed converted file. I don't know why my file is such different from biakemi's, but whatever using remove.unusual=FALSe dosen't solve the problem... Below, you can see the first four lines of the file. Maybe would you know what the problem is ?

CADCXH010000001.1 23532 23537 G32900 . - ROSLIN_INST three_prime_utr . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; CADCXH010000001.1 23532 23723 G32900 . - ROSLIN_INST exon . gene_id "G32900"; transcript_id "G32900.1"; exon_number "10"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding"; exon_id "G32900.1-E10"; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST gene . gene_id "G32900"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_id ""; CADCXH010000001.1 23532 35416 G32900 . - ROSLIN_INST transcript . gene_id "G32900"; transcript_id "G32900.1"; gene_source "ROSLIN_INST"; gene_biotype "protein_coding"; transcript_source "ROSLIN_INST"; transcript_biotype "protein_coding";

Thanks for your help !

Regards, Thomas.

I had the same issue but it turns out that gtf2bed was only converting it to a 10 column bed file. I used bedparse (https://bedparse.readthedocs.io/en/stable/Usage.html#convert-gtf-to-bed) to convert the gtf to a bed12 file and it worked.