HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
181 stars 32 forks source link

Inability to Generate Perfect Reads #272

Open Shruteek opened 2 days ago

Shruteek commented 2 days ago

I am trying to generate perfect reads from a single genome .FASTA file, and whenever I do, the package attempts to generate the reads but kicks me back to the help menu. I am able to generate reads with '--mode basic' and with error models like '--model novaseq', but not with '--mode perfect'. I am using InSilicoSeq version 2.0.1. Is anybody else able to reproduce this?

Minimally reproducible example: download the attached .fasta file chromosome.zip, and run the following 2 commands:

conda create -n test insilicoseq -c bioconda
iss generate --genomes chromosome.fasta --mode perfect --n_reads 100 --compress --output chromosome_perfect_reads

Expected output: 2 files, one called chromosome_perfect_reads_R1.fastq, another called chromosome_perfect_reads_R2.fastq

Actual output:

(test) workflow$ iss generate --genomes chromosome.fasta --mode perfect --compress --n_reads 100 --output chromosome_perfect_reads
INFO:iss.app:Starting iss generate
INFO:iss.generator:Using perfect ErrorModel
INFO:iss.util:Stitching input files together
INFO:iss.generator:Using lognormal abundance distribution
INFO:iss.app:Using 2 cpus for read generation
INFO:iss.app:Generating 100 reads
usage: iss [subcommand] [options]

InSilicoSeq: A sequencing simulator

options:
  -h, --help     show this help message and exit
  -v, --version  print software version and exit

available subcommands:

    model        generate an error model from a bam file
    generate     simulate reads from an error model
Shruteek commented 2 days ago

Accidentally closed because I thought I resolved the issue. It turns out my placement of the '--compress' option was partially responsible; moving that flag to the end of the command, while testing, I have been able to generate reads several times, but I am still unable to reliably generate reads, and in most of the cases the program produces the same erroneous output.

Shruteek commented 2 days ago

The core issue appears to be that there is no 'store_mutations' attribute in the PerfectErrorModel class (InSilicoSeq/iss/error_models/perfect.py), even though that attribute is expected in the main initialization function for Error Models (InSilicoSeq/iss/error_models/basic.py). As a result, there is a nonzero chance that the generator will attempt to mutate a read, in which case it will check 'store_mutations' and throw an error - if you only generate 5 or 10 reads, this may never happen, but at 100 or more, this will almost always cause an error, though there is a slight chance it will not.

Workaround: in InSilicoSeq/iss/error_models/perfect.py, add this line on line 20: self.store_mutations = False

Alternatively, change line 14 to: def __init__(self, fragment_length=None, fragment_sd=None, store_mutations=False): and add this line on line 20: self.store_mutations = store_mutations