RAHenriksen / NGSNGS

NGSNGS: Next generation simulator for next generation sequencing data
46 stars 4 forks source link

"Segmentation fault (core dumped)" happens for large input sequences. #27

Open nuzla opened 1 year ago

nuzla commented 1 year ago
ngsngs -i CHR20.fa -r 5 -l 1400 -seq SE -f fa -o read1400

    ngsngs version: cde9229 (htslib: 1.16-45-g463830b) build(Feb  4 2023 06:12:04)
    Mycommmand: ngsngs -i CHR20.fa -r 5 -l 1400 -seq SE -f fa -o read1400 
    The is provided read cycle length is: 0 or the inferred read cycle length is 0
    Number of contigs/scaffolds/chromosomes in file: 'CHR20.fa': 1
    Seed used: 1675726267
    Number of sampling threads used (-t): 1 and number of compression threads (-t2): 1
    Number of simulated reads: 5 or coverage: 0.000000
    Default PCR duplicate value 1
    Number of nref 1 in file: 'CHR20.fa'
    Allocated memory for 1 chromosomes/contigs/scaffolds from input reference genome
    Chromosome name first 20 and length 64444167 and full length 64444167
    File output name is read1400.fa
Segmentation fault (core dumped)
RAHenriksen commented 1 year ago

Thanks for your comment, this is an issue I haven't considered for potential long-read sequencing simulations, so this will improve the usability!.

It is a result of a hardcoded upper limit for the memory allocation for some of the sequences. I have just made a quick update to increase this limit to 10000, and then I will make a more detailed update where it will dynamically consider longer sequences without a hardcoded upper limit.

./ngsngs -i Test_Examples/Mycobacterium_leprae.fa.gz -r 10 -t 1 -s 1 -l 5000 -seq SE -q1 Test_Examples/AccFreqL150R1.txt -f fa -o read5k

If you find other issues related to long-read sequencing which could improve the functionality, please let me know.

nuzla commented 1 year ago

Thank you for the quick response with an update. Now it works for more than 1000 length reads.