alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 495 forks source link

remove extraenous OOB write #2163

Open eternal-flame-AD opened 3 days ago

eternal-flame-AD commented 3 days ago

fixes #2158

Not sure why this was added in the first place, with the line removed I could not reproduce a complaining valgrind with the nf-core/rnaseq test dataset or the dataset referenced in #2158 or my own dataset. However this line seem to produce a guaranteed OOB write:

yume@tsubasa ~/O/A/2/l/i/tmp (main) [102] > set NUM_BASES (gawk '{sum = sum + $2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (l
og(sum)/log(2))/2 - 1}}' genome.fa.fai)
yume@tsubasa ~/O/A/2/l/i/tmp (main) > valgrind --track-origins=yes --keep-stacktraces=alloc-and-free ~/build/STAR/source/STAR \              (rnaseq-smk) 23:31:04
                                              --runMode genomeGenerate \
                                              --genomeDir star/ \
                                              --genomeFastaFiles genome.fa \
                                              --sjdbGTFfile genome.filtered.gtf \
                                              --runThreadN 2 \
                                              --genomeSAindexNbases $NUM_BASES \
                                              --limitGenomeGenerateRAM 53587091200
==500278== Memcheck, a memory error detector
==500278== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==500278== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==500278== Command: /home/yume/build/STAR/source/STAR --runMode genomeGenerate --genomeDir star/ --genomeFastaFiles genome.fa --sjdbGTFfile genome.filtered.gtf --runThreadN 2 --genomeSAindexNbases 7 --limitGenomeGenerateRAM 53587091200
==500278== 
        /home/yume/build/STAR/source/STAR --runMode genomeGenerate --genomeDir star/ --genomeFastaFiles genome.fa --sjdbGTFfile genome.filtered.gtf --runThreadN 2 --genomeSAindexNbases 7 --limitGenomeGenerateRAM 53587091200
        STAR version: 2.7.11b   compiled: 2024-06-27T23:30:18-05:00 :/home/yume/build/STAR/source
Jun 27 23:31:06 ..... started STAR run
Jun 27 23:31:06 ... starting to generate Genome files
Jun 27 23:31:06 ..... processing annotations GTF
Jun 27 23:31:06 ... starting to sort Suffix Array. This may take a long time...
Jun 27 23:31:06 ... sorting Suffix Array chunks and saving them to disk...
Jun 27 23:31:11 ... loading chunks from disk, packing SA...
Jun 27 23:31:12 ... finished generating suffix array
Jun 27 23:31:12 ... generating Suffix Array index
Jun 27 23:31:12 ... completed Suffix Array index
==500278== Invalid read of size 8
==500278==    at 0x21530E: PackedArray::writePacked(unsigned long long, unsigned long long) (PackedArray.cpp:24)
==500278==    by 0x2894EA: sjdbInsertJunctions(Parameters&, Genome&, Genome&, SjdbClass&) (sjdbInsertJunctions.cpp:68)
==500278==    by 0x274E72: Genome::genomeGenerate() (Genome_genomeGenerate.cpp:365)
==500278==    by 0x212722: main (STAR.cpp:91)
==500278==  Address 0x5e6fc44 is 1,062,228 bytes inside a block of size 1,062,232 alloc'd
==500278==    at 0x48445F3: operator new[](unsigned long) (vg_replace_malloc.c:729)
==500278==    by 0x21536F: PackedArray::allocateArray() (PackedArray.cpp:32)
==500278==    by 0x27457C: Genome::genomeGenerate() (Genome_genomeGenerate.cpp:297)
==500278==    by 0x212722: main (STAR.cpp:91)
==500278== 
==500278== Invalid write of size 8
==500278==    at 0x215332: PackedArray::writePacked(unsigned long long, unsigned long long) (PackedArray.cpp:24)
==500278==    by 0x2894EA: sjdbInsertJunctions(Parameters&, Genome&, Genome&, SjdbClass&) (sjdbInsertJunctions.cpp:68)
==500278==    by 0x274E72: Genome::genomeGenerate() (Genome_genomeGenerate.cpp:365)
==500278==    by 0x212722: main (STAR.cpp:91)
==500278==  Address 0x5e6fc44 is 1,062,228 bytes inside a block of size 1,062,232 alloc'd
==500278==    at 0x48445F3: operator new[](unsigned long) (vg_replace_malloc.c:729)
==500278==    by 0x21536F: PackedArray::allocateArray() (PackedArray.cpp:32)
==500278==    by 0x27457C: Genome::genomeGenerate() (Genome_genomeGenerate.cpp:297)
==500278==    by 0x212722: main (STAR.cpp:91)
==500278== 
Jun 27 23:31:12 ... writing Genome to disk ...
Jun 27 23:31:12 ... writing Suffix Array to disk ...
Jun 27 23:31:12 ... writing SAindex to disk
Jun 27 23:31:12 ..... finished successfully
==500278== 
==500278== HEAP SUMMARY:
==500278==     in use at exit: 775,568 bytes in 436 blocks
==500278==   total heap usage: 7,795 allocs, 7,359 frees, 100,485,363 bytes allocated
==500278== 
==500278== LEAK SUMMARY:
==500278==    definitely lost: 112,243 bytes in 22 blocks
==500278==    indirectly lost: 0 bytes in 0 blocks
==500278==      possibly lost: 320 bytes in 1 blocks
==500278==    still reachable: 663,005 bytes in 413 blocks
==500278==         suppressed: 0 bytes in 0 blocks
==500278== Rerun with --leak-check=full to see details of leaked memory
==500278== 
==500278== For lists of detected and suppressed errors, rerun with: -s
==500278== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

https://github.com/alexdobin/STAR/blob/b1edc1208d91a53bf40ebae8669f71d50b994851/source/Genome_genomeGenerate.cpp#L298

From this line it seems the buffer for SA is only is nSA long thus accessing the nSA-th element will be a guaranteed OOB write