Genentech / Isosceles

The Isoforms from Single-Cell; Long-read Expression Suite
GNU General Public License v3.0
19 stars 0 forks source link

prepare_transcripts range width limit error #7

Open sbresnahan opened 1 month ago

sbresnahan commented 1 month ago

Version: isosceles v0.2.1

Running prepare_transcripts with bulk ONT reads aligned to chr1 of GRCh38 with minimap2:

BAM <- "chr1.bam"
GTF <- "gencode.v45.primary_assembly.annotation.gtf"
GENOME <- "GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"

bam_files <- c(Sample = BAM)
bam_parsed <- bam_to_read_structures(bam_files = bam_files,ncpu=4)

transcript_data <- prepare_transcripts(
  gtf_file = GTF,
  genome_fasta_file = GENOME,
  bam_parsed = bam_parsed,
  min_bam_splice_read_count = 2,
  min_bam_splice_fraction = 0.01
)

Results in this error:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'i' in selecting a method for function 'extractROWS': each range must have a width that is < 2^31
sbresnahan commented 1 month ago

This issue is resolved upstream by filtering reads longer than 2^31 with samtools view -e 'length(seq)<2147483648' -O BAM -o out.bam in.bam, however it would nice if isosceles could do this length filtering on import.

timbitz commented 1 month ago

Thanks for pointing this out! We'll look into this-- I agree it would be nice not to have to hand filter for this.