RJWANGbioinfo / APAlyzer

APAlyzer is a toolkit for bioinformatic analysis of alternative polyadenylation (APA) events using RNA sequencing data. Our main approach is the comparison of sequencing reads in regions demarcated by high-quality polyadenylation sites (PASs) annotated in the PolyA_DB database (https://exon.apps.wistar.org/PolyA_DB/v3/). The current version (v3.0) uses RNA-seq data to examine APA events in 3’ untranslated regions (3’UTRs) and in introns. The coding regions are used for gene expression calculation.
https://bioconductor.org/packages/release/bioc/html/APAlyzer.html
GNU Lesser General Public License v3.0
7 stars 4 forks source link

'relist': subscript contains out-of-bounds indices error #14

Closed nemitheasura closed 2 years ago

nemitheasura commented 2 years ago

Hi,

I have a problem similar to described in issue #9 .

Namely, I was trying to run the analysis on A. thalana. None of my GTF files was parsed correctly. I have installed the latest version of your software, yet the error is still present. Could you help me & debug?

Here are the links to the gtf files I used: https://drive.google.com/file/d/1XhuctVDKhhgVaGKZNKj_yae-OjVyKYMA/view?usp=sharing https://drive.google.com/file/d/1AZ3PVtuPCUt3eBInipRrsF0DzRx1EoXg/view?usp=sharing

I receive 2 types of errors: 'h(simpleError(msg, call))' 'relist': subscript contains out-of-bounds indices (file version 47) UNIQUE constraint failed: exon.exon_id (file version 52)

I would be grateful for your help.

Best regards, Nemi

RJWANGbioinfo commented 2 years ago

@nemitheasura sorry for the late reply, I'm just back from vacation. Which version of APAlyzer you are using? I just tested your GTF file using APAlyzer v1.9.3 and v1.9.4, it works fine from my end. Below is the testing log:

Testing the PAS2GEF using the GTF downloaded from your first link:

library(GenomicRanges)
library(APAlyzer)
GTFfile="Arabidopsis_thaliana.TAIR10.47.gtf" 
PASREFraw=PAS2GEF(GTFfile) 
refUTRraw=PASREFraw$refUTRraw
dfIPAraw=PASREFraw$dfIPA
dfLEraw=PASREFraw$dfLE
PASREF=REF4PAS(refUTRraw,dfIPAraw,dfLEraw)

And the reference is built without errors: image

Below is my sessionInfo:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicRanges_1.45.0 GenomeInfoDb_1.29.3  IRanges_2.27.0
[4] S4Vectors_0.31.0     BiocGenerics_0.39.1  APAlyzer_1.9.3

Perhaps you can confirm your APAlyzer version, and give another try, thanks