Hi!
I am trying to create in silico metagenome data with my own model. I created my modell from human stool shot-gun sequenced samples (Illumina NextSeq) following the "Error models" part of the documentation
megahit -1 reads_R1.fastq -2 reads_R2.fastq -o asm bowtie2-build asm/final.contigs.fa miseq_asm/final.contigs bowtie2 -x asm/final.contigs -1 reads_R1.fastq \ -2 reads_R2.fastq | samtools view -bS | samtools sort -o mapping.bam samtools index mapping.bam iss model -b mapping.bam -o name_of_model
After this, I tried to use my model to generate 5000000 reads from 200 bacterial genomes randomly downloaded from NCBI.
iss generate --ncbi bacteria -U 200 --model ./name_of_model --n_reads 5000000 -o ./output
The download of the genomes went without an issue, but at the read generation part I run into the following error message:
INFO:iss.download:Downloading GCF_014961985.1
INFO:iss.download:Downloading GCF_003955865.1
INFO:iss.download:Downloading GCF_005221645.1
INFO:iss.util:Stitching input files together
INFO:iss.app:Using lognormal abundance distribution
INFO:iss.app:Using 2 cpus for read generation
INFO:iss.app:Generating 5000000 reads
INFO:iss.app:Generating reads for record: NZ_CP077680.1
INFO:iss.app:Generating reads for record: NZ_CP016813.1
INFO:iss.app:Generating reads for record: NZ_CP022611.1
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
r = call_item()
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/iss/generator.py", line 62, in reads
forward, reverse = simulate_read(record, ErrorModel, i, cpu_number)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/iss/generator.py", line 163, in simulate_read
reverse = ErrorModel.introduce_error_scores(reverse, 'reverse')
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/iss/error_models/__init__.py", line 65, in introduce_error_scores
record.letter_annotations["phred_quality"] = self.gen_phred_scores(
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/Bio/SeqRecord.py", line 89, in __setitem__
raise TypeError(
TypeError: We only allow python sequences (lists, tuples or strings) of length 36.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/deltagene/anaconda3/envs/insilicoseq/bin/iss", line 10, in <module>
sys.exit(main())
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/iss/app.py", line 608, in main
args.func(args)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/iss/app.py", line 306, in generate_reads
record_file_name_list = Parallel(
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/parallel.py", line 1056, in __call__
self.retrieve()
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/concurrent/futures/_base.py", line 438, in result
return self.__get_result()
File "/home/deltagene/anaconda3/envs/insilicoseq/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
TypeError: We only allow python sequences (lists, tuples or strings) of length 36.
I got the same error for mupltiple different custom models. For some, it seemed that the read generation was performed properly, but I checked out the fastq files and the read length was only 36 bases long for each read.
Has anybody ran into the same issue? What could cause it?
Thanks,
Bertalan Takács
EDIT:
Okay, I checked out my model and it seems like that the read length for some reason is set to 36. That explains at least one part of my problem, why the generated reads are short. Still, I don't know how this was set (the reads were generated from a sample with pretty good coverage).
Hi! I am trying to create in silico metagenome data with my own model. I created my modell from human stool shot-gun sequenced samples (Illumina NextSeq) following the "Error models" part of the documentation
megahit -1 reads_R1.fastq -2 reads_R2.fastq -o asm bowtie2-build asm/final.contigs.fa miseq_asm/final.contigs bowtie2 -x asm/final.contigs -1 reads_R1.fastq \ -2 reads_R2.fastq | samtools view -bS | samtools sort -o mapping.bam samtools index mapping.bam iss model -b mapping.bam -o name_of_model
After this, I tried to use my model to generate 5000000 reads from 200 bacterial genomes randomly downloaded from NCBI.iss generate --ncbi bacteria -U 200 --model ./name_of_model --n_reads 5000000 -o ./output
The download of the genomes went without an issue, but at the read generation part I run into the following error message:I got the same error for mupltiple different custom models. For some, it seemed that the read generation was performed properly, but I checked out the fastq files and the read length was only 36 bases long for each read.
Has anybody ran into the same issue? What could cause it? Thanks, Bertalan Takács
EDIT: Okay, I checked out my model and it seems like that the read length for some reason is set to 36. That explains at least one part of my problem, why the generated reads are short. Still, I don't know how this was set (the reads were generated from a sample with pretty good coverage).