HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
184 stars 32 forks source link

Typo in documentation and possible bug #209

Closed JHarrisonEcoEvo closed 1 year ago

JHarrisonEcoEvo commented 3 years ago

Hi there,

When using the code from the documentation: iss generate -k bacteria viruses -u 10 4 --model miseq --output miseq_ncbi

I receive the following error: usage: iss [subcommand] [options] InSilicoSeq: error: unrecognized arguments: 4

The preceding command in the documentation also gives an error:

iss generate --ncbi bacteria -u 10 --model miseq --output miseq_ncbi INFO:iss.app:Starting iss generate INFO:iss.app:Using kde ErrorModel ERROR:iss.app:--ncbi/-k requires --n_genomes_ncbi/-U. Aborting.

Through reading further I determined a working syntax: iss generate -k bacteria viruses --n_genomes_ncbi 10 4 --model miseq --output miseq_ncbi

This seems to work for a bit (see pasted stdout below), but then runs into a problem. Here is what I get when running the above command on Mac Os within a conda environment. I can provide the environment yml if needed.

INFO:iss.app:Starting iss generate INFO:iss.app:Using kde ErrorModel INFO:iss.download:Searching for bacteria to download INFO:iss.download:Downloading GCF_001298485.1 INFO:iss.download:Downloading GCF_000261045.2 INFO:iss.download:Downloading GCF_009730175.1 INFO:iss.download:Downloading GCF_000829355.1 INFO:iss.download:Downloading GCF_012932985.2 INFO:iss.download:Downloading GCF_015337085.1 INFO:iss.download:Downloading GCF_011331065.1 INFO:iss.download:Downloading GCF_013162485.1 INFO:iss.download:Downloading GCF_900638555.1 INFO:iss.download:Downloading GCF_001975365.1 INFO:iss.download:Searching for viruses to download INFO:iss.download:Downloading GCF_000854345.1 INFO:iss.download:Downloading GCF_003033365.1 INFO:iss.download:Downloading GCF_001755205.1 INFO:iss.download:Downloading GCF_000851345.1 INFO:iss.util:Stitching input files together INFO:iss.app:Using lognormal abundance distribution INFO:iss.app:Using 2 cpus for read generation INFO:iss.app:Generating 1000000 reads INFO:iss.app:Generating reads for record: NZ_CP012522.1 /Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py:1754: BiopythonDeprecationWarning: myseq.tomutable() is deprecated; please use MutableSeq(myseq) instead. warnings.warn( /Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py:1754: BiopythonDeprecationWarning: myseq.tomutable() is deprecated; please use MutableSeq(myseq) instead. warnings.warn( /Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py:2749: BiopythonDeprecationWarning: myseq.toseq() is deprecated; please use Seq(myseq) instead. warnings.warn( /Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py:2749: BiopythonDeprecationWarning: myseq.toseq() is deprecated; please use Seq(myseq) instead. warnings.warn( joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker r = call_item() File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in call return self.fn(*self.args, self.kwargs) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(*args, *kwargs) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 262, in call return [func(args, kwargs) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 262, in return [func(*args, **kwargs) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/generator.py", line 62, in reads forward, reverse = simulate_read(record, ErrorModel, i, cpu_number) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/generator.py", line 161, in simulate_read reverse.seq = ErrorModel.introduce_indels( File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/error_models/init.py", line 192, in introduce_indels seq = self.adjust_seq_length( File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/error_models/init.py", line 143, in adjust_seq_length full_sequence[read_end + i]) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py", line 433, in getitem return self.class(self._data[index]) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/Bio/Seq.py", line 1727, in init raise TypeError( TypeError: data should be a string, bytes, bytearray, Seq, or MutableSeq object """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/jharrison/opt/anaconda3/bin/iss", line 10, in sys.exit(main()) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/app.py", line 608, in main args.func(args) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/iss/app.py", line 306, in generate_reads record_file_name_list = Parallel( File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in call self.retrieve() File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/Users/jharrison/opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result return future.result(timeout=timeout) File "/Users/jharrison/opt/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 439, in result return self.get_result() File "/Users/jharrison/opt/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in get_result raise self._exception TypeError: data should be a string, bytes, bytearray, Seq, or MutableSeq object

Michal-Babins commented 3 years ago

Hey Josh,

Just wanted to confirm that I am getting this same issue. The job appears to initiate inputs, but then fails with the same error.

HadrienG commented 3 years ago

Hi!

Thanks for reporting this. There seems to be a lot of problems coming for the newest version of Biopython, which deprecated many things in a minor release and this might be related. I'll have a look and fix for the next version.

Meanwhile, this problem may disappear if you downgrade to BioPython 1.78.

Best, Hadrien

afvrbanac commented 3 years ago

Hello, I also encountered the error, "TypeError: data should be a string, bytes, bytearray, Seq, or MutableSeq object ", and downgrading to BioPython 1.78 fixed it for me.

bIomBen commented 3 years ago

Hello :) , I have the same error mentioned above, however BioPython version 1.78 does not fix the problem for me (I even tried 1.77). Does anybody have an idea what else I am doing wrong?

command: iss generate --genomes multi.fna --abundance_file abundance.txt --model miseq --output miseq_test --n_reads 100k

error: /home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py:1754: BiopythonDeprecationWarning: myseq.tomutable() is deprecated; please use MutableSeq(myseq) instead. warnings.warn( /home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py:2749: BiopythonDeprecationWarning: myseq.toseq() is deprecated; please use Seq(myseq) instead. warnings.warn( /home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py:1754: BiopythonDeprecationWarning: myseq.tomutable() is deprecated; please use MutableSeq(myseq) instead. warnings.warn( /home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py:2749: BiopythonDeprecationWarning: myseq.toseq() is deprecated; please use Seq(myseq) instead. warnings.warn( joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/home/philipp/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker r = call_item() File "/home/philipp/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in call return self.fn(*self.args, self.kwargs) File "/home/philipp/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(*args, *kwargs) File "/home/philipp/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in call return [func(args, kwargs) File "/home/philipp/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in return [func(*args, **kwargs) File "/home/philipp/.local/lib/python3.8/site-packages/iss/generator.py", line 62, in reads forward, reverse = simulate_read(record, ErrorModel, i, cpu_number) File "/home/philipp/.local/lib/python3.8/site-packages/iss/generator.py", line 161, in simulate_read reverse.seq = ErrorModel.introduce_indels( File "/home/philipp/.local/lib/python3.8/site-packages/iss/error_models/init.py", line 192, in introduce_indels seq = self.adjust_seq_length( File "/home/philipp/.local/lib/python3.8/site-packages/iss/error_models/init.py", line 143, in adjust_seq_length full_sequence[read_end + i]) File "/home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py", line 433, in getitem return self.class(self._data[index]) File "/home/philipp/.local/lib/python3.8/site-packages/Bio/Seq.py", line 1727, in init raise TypeError( TypeError: data should be a string, bytes, bytearray, Seq, or MutableSeq object """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/philipp/anaconda3/bin/iss", line 10, in sys.exit(main()) File "/home/philipp/.local/lib/python3.8/site-packages/iss/app.py", line 608, in main args.func(args) File "/home/philipp/.local/lib/python3.8/site-packages/iss/app.py", line 306, in generate_reads record_file_name_list = Parallel( File "/home/philipp/.local/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in call self.retrieve() File "/home/philipp/.local/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/home/philipp/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result return future.result(timeout=timeout) File "/home/philipp/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 439, in result return self.get_result() File "/home/philipp/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in get_result raise self._exception TypeError: data should be a string, bytes, bytearray, Seq, or MutableSeq object

Thanks in advance for any answer! :)

Best regards Philipp

HadrienG commented 1 year ago

Will be fixed in 1.6.0