reck999 commented 7 months ago

I was able to successfully analyze 3' UTR from the dataset GSE230025 aligned to the UCSC genome after creating a reference for C. elegans using the PAS2GEF function from the Ensembl GTF. When I went to analyze intronic APA, I received the error 'dflength$end - dflength$start: non-numeric argument to binary operator' from PASEXP_IPA. I have included my code and session info below. Is there an error in my reference construction or code that could explain this roadblock? Is there anything I can correct to run the analysis? I am happy to provide more information or bam files. Thank you so much for this great package!

Randall

setwd("E:/Celegans_TDP1_UCSC") library(APAlyzer) library(repmis) library(GenomicRanges) Loading required package: stats4 Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

findMatches

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

windows

Loading required package: GenomeInfoDb

Building a worm reference

download.file(url='https://ftp.ensembl.org/pub/release-111/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.111.gtf.gz',

destfile='Caenorhabditis_elegans.WBcel235.111.gtf.gz')
trying URL 'https://ftp.ensembl.org/pub/release-111/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.111.gtf.gz' Content type 'application/x-gzip' length 8730028 bytes (8.3 MB) downloaded 8.3 MB

GTFfile="Caenorhabditis_elegans.WBcel235.111.gtf.gz" PASREFraw=PAS2GEF(GTFfile)
[1] "PAS2GEF: Reading GTF file" [1] "PAS2GEF: Extracting and annotating all PASs" [1] "PAS2GEF: Extracting and filtering 3'UTR PASs" [1] "PAS2GEF: Extracting IPAs" [1] "PAS2GEF: Extracting 3' last exons" [1] "PAS2GEF: Finalizing references" Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. refUTRraw=PASREFraw$refUTRraw dfIPAraw=PASREFraw$dfIPA dfLEraw=PASREFraw$dfLE PASREF=REF4PAS(refUTRraw,dfIPAraw,dfLEraw) dfIPA=PASREF$dfIPA dfLE=PASREF$dfLE
UTRdbraw=REF3UTR(refUTRraw)

RNA-seq BAM files

flsall <- dir(getwd(),".bam") flsall<-paste0(getwd(),'/',flsall) names(flsall)<-gsub('.bam','',dir(getwd(),".bam"))

Calculation of UTR and IPA

IPA_OUTraw=PASEXP_IPA(PASREF$dfIPA,dfLE, flsall, SeqType ='ThreeMostPairEnd') Error in dflength$end - dflength$start : non-numeric argument to binary operator sessionInfo() R version 4.3.3 (2024-02-29 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

time zone: America/Los_Angeles tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 repmis_0.5
[7] APAlyzer_1.16.0

loaded via [1] tidyselect_1.2.1 [5] filelock_1.0.3 [9] RCurl_1.98-1.14 [13] XML_3.99-0.16.1 [17] RSQLite_2.3.5 [21] progress_1.2.3 [25] data.table_1.15.2 [29] bit_4.0.5 [33] xml2_1.3.6 [37] purrr_1.0.2 [41] colorspace_2.1-0 [45] Rsubread_2.16.1 [49] generics_0.1.3 [53] rjson_0.2.21 [57] zlibbioc_1.48.2 [61] restfulr_0.0.15 [65] hms_1.1.3 [69] locfit_1.5-9.9 [73] stringi_1.8.3 [77] tibble_3.2.1 [81] GenomeInfoDbData_1.2.11 [85] Biobase_2.60.0 [89] memoise_2.0.1 [93] MatrixGenerics_1.14.0 a namespace (and not attached): dplyr_1.1.4 blob_1.2.4 R.utils_2.12.3
Biostrings_2.70.3 bitops_1.0-7 fastmap_1.1.1
BiocFileCache_2.10.1 VariantAnnotation_1.48.1 GenomicAlignments_1.38.2
digest_0.6.35 lifecycle_1.0.4 KEGGREST_1.42.0
magrittr_2.0.3 compiler_4.3.3 rlang_1.1.3
tools_4.3.3 utf8_1.2.4 yaml_2.3.8
rtracklayer_1.62.0 prettyunits_1.2.0 S4Arrays_1.2.1
curl_5.2.1 DelayedArray_0.28.0 plyr_1.8.9
abind_1.4-5 BiocParallel_1.34.2 R.cache_0.16.0
R.oo_1.26.0 grid_4.3.3 fansi_1.0.6
ggplot2_3.5.0 scales_1.3.0 biomaRt_2.58.2
SummarizedExperiment_1.32.0 cli_3.6.2 crayon_1.5.2
HybridMTest_1.46.0 rstudioapi_0.16.0 httr_1.4.7
DBI_1.2.2 cachem_1.0.8 stringr_1.5.1
parallel_4.3.3 AnnotationDbi_1.64.1 XVector_0.42.0
matrixStats_1.2.0 vctrs_0.6.5 Matrix_1.6-5
bit64_4.0.5 ggrepel_0.9.5 GenomicFeatures_1.54.4
tidyr_1.3.1 glue_1.7.0 codetools_0.2-19
gtable_0.3.4 BiocIO_1.12.0 munsell_0.5.0
pillar_1.9.0 rappdirs_0.3.3 BSgenome_1.70.2
R6_2.5.1 dbplyr_2.5.0 lattice_0.22-6
R.methodsS3_1.8.2 png_0.1-8 Rsamtools_2.18.0
Rcpp_1.0.12 SparseArray_1.2.4 DESeq2_1.42.1
pkgconfig_2.0.3

Toutouflipi commented 7 months ago

Somebody before used this code for this I think:

ensure that coordinates are numeric

dfIPA$Pos = as.numeric(as.character(dfIPA$Pos)) dfIPA$upstreamSS = as.numeric(as.character(dfIPA$upstreamSS)) dfIPA$downstreamSS = as.numeric(as.character(dfIPA$downstreamSS)) dfLE$LEstart = as.numeric(as.character(dfLE$LEstart)) dfLE$TES = as.numeric(as.character(dfLE$TES))

(it helped for me when I had a similar issue).

Good luck!!

reck999 commented 7 months ago

This worked! Thank you for the quick response! Closing this thread now.

RJWANGbioinfo / APAlyzer

PASEXP_IPA 'Error in dflength$end - dflength$start: non-numeric argument to binary operator' from GTF constructed reference #19

Building a worm reference

RNA-seq BAM files

Calculation of UTR and IPA

ensure that coordinates are numeric