Closed snayfach closed 5 years ago
I will answer for Kate... PHANOTATE returns coordinates of protein coding genes. To automatically generate the amino-acid sequences, you can use multiPhATE, a phage annotation pipeline, which incorporates PHANOTATE. A parameter, translate_only=‘true’, will do gene-calling and translation.
https://github.com/carolzhou/multiPhATE.git. See README.
Kate: Feel free to correct.
-Carol Zhou
Sent from my iPhone
On May 14, 2019, at 1:58 PM, Stephen Nayfach notifications@github.com<mailto:notifications@github.com> wrote:
What's the best way to determine if an ORF is protein coding and the protein sequence that it encodes?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/deprekate/PHANOTATE/issues/7?email_source=notifications&email_token=ACWGOOZ6J64LCOKXRLFYLFTPVMRVTA5CNFSM4HM5TQB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GTYYNHA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWGOO2RW5MFSE5QDTXPUMLPVMRVTANCNFSM4HM5TQBQ.
Can you briefly describe how the translation works? Does it use the standard genetic code using the PHANNOTATE ORFs?
multiPhATE’s default genetic code for translation is “bacterial”. It uses transeq within EMBOSS. Yes, using the PHANOTATE ORFs, or you can select another gene finder.
Sent from my iPhone
On May 14, 2019, at 2:29 PM, Stephen Nayfach notifications@github.com<mailto:notifications@github.com> wrote:
Can you briefly describe how the translation works? Does it use the standard genetic code using the PHANNOTATE ORFs?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/deprekate/PHANOTATE/issues/7?email_source=notifications&email_token=ACWGOO7ZRABAJOYJKGWOER3PVMVLVA5CNFSM4HM5TQB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVM275Q#issuecomment-492417014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWGOO4WNRZGABTRAKFO4ZLPVMVLVANCNFSM4HM5TQBQ.
Thanks for your responses!
You can output the protein coding genes as nucleic acids using the -f fasta
flag. Then you can use any tool to translate them into amino-acids.
In the future I will add the ability to output them as amino-acids, in fact a bit of code from a subsequent PHANOTATE version made it into this release: https://github.com/deprekate/PHANOTATE/blob/7ebb90a5f324a62e885471b171454d87af22c50e/lib/orfs.py#L102 (an easy way to translate into amino-acids with under 10 lines of python code)
Thanks I think that will be a useful addition.
Unfortunately, the tool is too slow for my needs, so I will be sticking with Prodigal for the time being. If you find a way to increase the speed, please let me know!
yep, since it is written in Python it is quite slow. I may need to re-write certain parts of the code in C (like I did with fastpathz
).
To speed things up considerably you can use pypy
instead of python
to run it, which is what I do:
pypy phanotate.py INFILE.FNA
Interesting. How much of a speedup do you get using pypy?
It went from an hour to a minute. I just ran it on the T4 sequence file that is in the tests
folder, which is 170kb, so a longer genome, and it took 59 seconds on my old HP laptop
Can you add the pypy information to the readme (that it makes the tool run faster)? That information will be useful to others
Sure thing. I actually meant to, but guess I didn't push the hint. It's quite amazing the speedup that comes with JIT compiling python code : )
What's the best way to determine if an ORF is protein coding and the protein sequence that it encodes?