ge11232002 / CNEr

Conserved Noncoding Elements (CNEs) Identification and Visualisation
Other
3 stars 5 forks source link

Test run human hg38 v danio danRer10 chr1 failed #26

Open seburgess opened 2 years ago

seburgess commented 2 years ago

I ran an abbreviated version of the example given online comparing zebrafish chr1 to human chr1:

relevant R script:

assemblyDir <- "/data/burgess/CNE" axtDir <- "/data/burgess/CNE/hgdr_lav" assemblyTarget <- file.path(system.file("extdata", package="BSgenome.Drerio.UCSC.danRer10"), "single_sequences.2bit") assemblyQuery <- file.path(system.file("extdata", package="BSgenome.Hsapiens.UCSC.hg38"), "single_sequences.2bit") lavs <- lastz(assemblyTarget, assemblyQuery, outputDir=axtDir, chrsTarget=c("chr1"), chrsQuery=c("chr1"), distance="far", mc.cores=1)

Program runs properly but errors out when the table size exceeds limits:

FAILURE: in add_segment() table size (4,869,542,152 for 101,448,794 segments) exceeds allocation limit of 4,294,967,279; consider raising scoring threshold (--hspthresh or --exact) or breaking your target sequence into smaller pieces Error in my.system(cmd) : res == 0 is not TRUE Calls: lastz ... mcmapply -> .mapply -> -> my.system -> stopifnot Execution halted

Whole log:

[+] Loading LASTZ 1.04.03 on cn0854 [+] Loading gcc 9.2.0 ... [+] Loading GSL 2.6 for GCC 9.2.0 ... [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 4.0.5 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn0854 [+] Loading HDF5 1.10.4 [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc 2.16.2 on cn0854 [+] Loading pcre2 10.21 ... [+] Loading R 4.1.0 Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.0 (2021-05-18) Installing package(s) 'BSgenome.Drerio.UCSC.danRer10' trying URL 'https://bioconductor.org/packages/3.14/data/annotation/src/contrib/BSgenome.Drerio.UCSC.danRer10_1.4.2.tar.gz' Content type 'application/x-gzip' length 320075403 bytes (305.2 MB)

downloaded 305.2 MB

installing source package ‘BSgenome.Drerio.UCSC.danRer10’ ... using staged installation R inst byte-compile and prepare package for lazy loading help installing help indices finding HTML links ... done building package indices testing if installed package can be loaded from temporary location testing if installed package can be loaded from final location testing if installed package keeps a record of temporary installation path DONE (BSgenome.Drerio.UCSC.danRer10)

The downloaded source packages are in ‘/lscratch/32703980/Rtmp1m40Ky/downloaded_packages’ Installation paths not writeable, unable to update packages path: /usr/local/apps/R/4.1/4.1.0/lib64/R/library packages: class, foreign, lattice, MASS, Matrix, mgcv, nnet, rpart, spatial path: /usr/local/apps/R/4.1/site-library_4.1.0 packages: GenomeInfoDb, nlme, pbdMPI, proj4, ragg, rgdal, rgeos, rJava, Rmpi, RMySQL, ROpenCVLite, Seurat, sf, systemfonts, terra, udunits2, units, V8 Loading required package: BSgenome Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors Loading required package: stats4

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: GenomicRanges Loading required package: Biostrings Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

Loading required package: rtracklayer

Attaching package: ‘CNEr’

The following object is masked from ‘package:Biostrings’:

N50

lastz /spin1/home/linux/burgess/R/4.1/library/BSgenome.Drerio.UCSC.danRer10/extdata/single_sequences.2bit/chr1 /usr/local/apps/R/4.1/site-library_4.1.0/BSgenome.Hsapiens.UCSC.hg38/extdata/single_sequences.2bit/chr1 C=0 E=30 H=2000 K=2200 L=6000 M=50 O=400 T=2 Y=3400 Q=/lscratch/32703980/Rtmp1m40Ky/file38824139aee7.lastzMatrix --format=lav --output=/data/burgess/CNE/hgdr_lav/chr1.single_sequences-chr1.single_sequences.lav --markend FAILURE: in add_segment() table size (4,869,542,152 for 101,448,794 segments) exceeds allocation limit of 4,294,967,279; consider raising scoring threshold (--hspthresh or --exact) or breaking your target sequence into smaller pieces Error in my.system(cmd) : res == 0 is not TRUE Calls: lastz ... mcmapply -> .mapply -> -> my.system -> stopifnot Execution halted