fhcrc / seqmagick

An imagemagick-like frontend to Biopython SeqIO
http://seqmagick.readthedocs.org
GNU General Public License v3.0
113 stars 22 forks source link

Translation error with new BioPython #79

Closed fungs closed 4 years ago

fungs commented 6 years ago

I'm translating nucleotide multiple sequence alignments to amino acid. Codons can contain gap symbols which should be ignored. I'm using seqmagick 0.7.0 in both cases. The command I'm using is

 seqmagick convert --translate dna2protein --line-wrap 0 msa.fna msa.faa

This is working with BioPython 1.70 but not working with BioPython 1.72. This is the error message:

Traceback (most recent call last): File "/env/bin/seqmagick", line 11, in sys.exit(main()) File "/env/lib/python3.5/site-packages/seqmagick/scripts/cli.py", line 29, in main return action(arguments) File "/env/lib/python3.5/site-packages/seqmagick/subcommands/convert.py", line 354, in action transform_file(src, dest, arguments) File "/env/lib/python3.5/site-packages/seqmagick/subcommands/convert.py", line 308, in transform_file writer.write_file(records) File "/env/lib/python3.5/site-packages/Bio/SeqIO/Interfaces.py", line 237, in write_file count = self.write_records(records) File "/env/lib/python3.5/site-packages/Bio/SeqIO/Interfaces.py", line 221, in write_records for record in records: File "/env/lib/python3.5/site-packages/seqmagick/transform.py", line 703, in translate protein = seq.translate(table, to_stop=to_stop) File "/env/lib/python3.5/site-packages/Bio/Seq.py", line 1163, in translate cds, gap=gap) File "/env/lib/python3.5/site-packages/Bio/Seq.py", line 2543, in _translate_str dual_coding = [c for c in stop_codons if c in forward_table] File "/env/lib/python3.5/site-packages/Bio/Seq.py", line 2543, in dual_coding = [c for c in stop_codons if c in forward_table] File "/env/lib/python3.5/site-packages/seqmagick/transform.py", line 663, in getitem elif '-' in codon: TypeError: argument of type 'int' is not iterable

It seems, Biopython changed the way the process the translation lookup.

fungs commented 6 years ago

If I replace gaps with Ns, it gives the same error message.

metasoarous commented 6 years ago

@fungs Thanks for reporting. Will take a look at this soon...

ressy commented 5 years ago

Same problem here, also seqmagick 0.7.0 -- it looks to be any translation under Biopython newer than version 1.70. Here's a minimal working example:

$ echo -e '>seq\nATG' | seqmagick convert --translate dna2protein - -
>seq
M

And then with newer Biopython I get the same TypeError @fungs reported.

Trying an idea from near the end of that traceback:

>>> import Bio.Data.CodonTable
>>> tbl = Bio.Data.CodonTable.ambiguous_dna_by_name["Standard"]
>>> tbl_wrn = seqmagick.transform.CodonWarningTable(tbl.forward_table)
>>> tbl.forward_table["ATG"]
'M'
>>> tbl_wrn["ATG"]
'M'
>>> "ATG" in tbl.forward_table
True
>>> "ATG" in tbl_wrn
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jesse/miniconda3/lib/python3.6/site-packages/seqmagick/transform.py", line 663, in __getitem__
    elif '-' in codon:
TypeError: argument of type 'int' is not iterable

Just a shot in the dark, but does CodonWarningTable need to implement __contains__ for this to work again? Something like this, maybe?

>>> class CodonWarningTable2(seqmagick.transform.CodonWarningTable):
...     def __contains__(self, codon):
...         return codon in self.wrapped
...
>>> tbl_wrn2 = Tbl2(tbl.forward_table)
>>> "ATG" in tbl2
True
fungs commented 5 years ago

@ressy in case you need a good tool for translation, https://github.com/shenwei356/seqkit is working flawlessly

ressy commented 5 years ago

Thanks! I hadn't seen that one before.

eharkins commented 4 years ago

fixed in #76 and released in #82