HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
177 stars 32 forks source link

failed with own error model #158

Closed lmanchon closed 4 years ago

lmanchon commented 4 years ago

--Hi, i have tried to create own error model with:

bowtie2-build --threads 16 -f chr8.fasta bowtie2_index/chr8.genome
bowtie2 -p 16 -x bowtie2_index/chr8.genome -1 chr8_paired1.fq.gz -2 chr8_paired2.fq.gz | samtools view -@16 -bS | samtools sort -@16 -o genomes.bam
samtools index genomes.bam
bin/iss model -b genomes.bam -o genomes

(chr8_paired1.fq.gz and chr8_paired2.fq.gz are paired-end reads 50bp length)

then program failed with: bin/iss generate --cpus 8 -n 180M --genomes chr8.fasta --model genomes.npz -z --output reads

INFO:iss.app:Starting iss generate
INFO:iss.app:Using kde ErrorModel
INFO:iss.util:Stitching input files together
INFO:iss.app:Using lognormal abundance distribution
INFO:iss.app:Using 8 cpus for read generation
INFO:iss.app:Generating 180000000 reads
INFO:iss.app:Generating reads for record: chr8
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File ".local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File ".local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File ".local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 608, in __call__
    return self.func(*args, **kwargs)
  File ".local/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File ".local/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File ".local/lib/python3.7/site-packages/iss/generator.py", line 65, in reads
    forward, reverse = simulate_read(record, ErrorModel, i, cpu_number)
  File ".local/lib/python3.7/site-packages/iss/generator.py", line 140, in simulate_read
    forward = ErrorModel.introduce_error_scores(forward, 'forward')
  File ".local/lib/python3.7/site-packages/iss/error_models/__init__.py", line 66, in introduce_error_scores
    self.quality_forward, 'forward')
  File ".local/lib/python3.7/site-packages/Bio/SeqRecord.py", line 88, in __setitem__
    "strings) of length {0}.".format(self._length)
TypeError: We only allow python sequences (lists, tuples or strings) of length 50.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bin/iss", line 11, in <module>
    load_entry_point('InSilicoSeq==1.4.5', 'console_scripts', 'iss')()
  File ".local/lib/python3.7/site-packages/iss/app.py", line 547, in main
    args.func(args)
  File ".local/lib/python3.7/site-packages/iss/app.py", line 260, in generate_reads
    args.gc_bias, mode) for i in range(cpus))
  File ".local/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in __call__
    self.retrieve()
  File ".local/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File ".local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result
    return future.result(timeout=timeout)
  File "/tools/python/3.7.4/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/tools/python/3.7.4/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: We only allow python sequences (lists, tuples or strings) of length 50.

what's wrong ?

HadrienG commented 4 years ago

Hi!

Thanks for reporting this, can you attach your genomes.npz so I can take a look at the model?

PS: I noticed that you had filed #155 before. Is your dataset chr8_paired available somewhere or is it internal data if you don't mind me asking?

lmanchon commented 4 years ago

--Hi, you can download the files from this link: https://filesender.renater.fr/?s=download&token=f639dc6e-6e79-4b18-a91b-e363d6e252bd thanks.

HadrienG commented 4 years ago

Hi,

There seems to be a bug in InSilicoSeq when you try to build a model with no sequences with quality over 20. I'm working on a fix.

I have to say though, those are pretty bad quality sequences, with only about 97.5% correct basecalls, which is unusually low for Illumina. Are you sure you want to generate an error model with those?

lmanchon commented 4 years ago

-- yes i need it. i have generated those data using ART-MountRainier tool on chr8 and with this command: art_illumina -ss HS25 -i chr8.fasta -qs -20 -qs2 -20 -p -l 50 -f 20 -m 200 -s 10 -o chr8_paired

HadrienG commented 4 years ago

understood. I assumed you wanted a NovaSeq 2x50bp model, and was worried that it did not look like a typical run. Will come back to you when I have a fix.

HadrienG commented 4 years ago

Should be fixed in 1.4.6