TransDecoder / TransDecoder

TransDecoder source
Other
267 stars 58 forks source link

Missed ORF #100

Closed shamsbhuiyan closed 1 year ago

shamsbhuiyan commented 4 years ago

When I run the following sequence through transdecoder, it misses a complete ORF of 204 lengths (looking at the longest_orfs.pep file):

ENSMUST00000000305_ENSMUSG00000000296 AAAATGAATTATTTTAATGACTTAAATAAATGCACATAAAAATCACAAGCAATTTGGTGAAGTTTGTAGGATTTGCGCCCTGCTGGAGACCGTCATAAATAGTGTCACCTGTCGATCTACAGACTGGCAGGTGCACTCTGAGACATGCATATTGGTTTGGAGACCAAGTCAAAACCAAAGCTAATAATCAGTATCTGCATGATTGGGAAATACATTGAGTTCCGCAGACAAGAATTCAGAACAATTATGACCCTGCTCCAGGTTGTCTATTTTTAAAGATGACATTTGTAAATACTCCAGAGTAGTGGTCATCAACACAAATGCAAAAAGAAATCATTTTATTAAGCTTTTTGGAGACCCCATTTAAATATCAATGTCAAAACTTGGGGTTCCCAGTGGGATTTTGGTCTTCATATCAGAGGATGAGCAGGCAGATTGACCAGCAGCAGCCCAGGCGGGACCTAGCACTGCAGCTCCTCGTCCTTGGTCTGCCGGGAGCCTGCTGAAGCATTCTGTGAGCTGGCGTGTGCTGTGGAGTTCAGGACCTCCTCAAAACTGCCACCACCGTGGTTTGTCCCACCTACTTTCGTCTTGAGGCTTGCAACAGTTGTCTCAACCCTCTCCTCAAATGATTTGAAAGTAGAAGAATTCCTCATGGCAGGCATACTTATGGAATGGCGGATGGAGTACCTCATATCTCCAAACTTCTTGCTGATGGCAGTTCCCACGTTATTGAAAGCTGCTGTTGCCTTCTGCCCTGCGTGGCTCAGGGTTTCGTGCGTTTTCTTGTACGCAGTCGTGGTCTGCATGTCGTGCCAGCTCCTGCTGAAGTTCTGCTTTAACTCATTCATCAGATTCATGCCGAGTTTCTGTTTGATCTCAACCAGATGTCTTTCTTTTGCTGACAAAACTTGTCGTAATGTTGTGATTTCGTCTTCTAGCTGAATTAACTCTGCCTTTAGCTCTTCCTTCTCCTCCTCAGAGAGCATGCTAGAGAAGTCAGCACTGCCTACTGCATCCCCATCTCTTCCTTGTAGCGGTTCCGTCTCCAACAAGCCTTGTGCCTGCGCCTCCATGGTGGCCGGTTTGGAGCTGAGCACCGCGGAGAACACGCGGACTCCTCGCAGAGCAGCAGATGGTAGCGATTCCCGAGCTCGGAAGACAGCTAGCCCCGCCGCCCGTGCCCTCTCCCGGCGCGCGTGTCGCGAGCATTCGTGGAGGGGGCGGGGTACTGGCGGCGGCCACCACTGGCCGCTGCTGGTTACCT

However if I run the exact same sequence through the NCBI's ORF finder, it finds the 204 amino acid sequence.: https://www.ncbi.nlm.nih.gov/orffinder/

The 204 amino acid sequence is the "correct" one, at least according to ensembl. Why does TransDecoder miss it? Thank you

brianjohnhaas commented 4 years ago

Hi,

TransDecoder long orfs finds it, but as a longer 5' partial sequence that begins off the end of the contig.

Screenshot attached.

best,

~b

shamsbhuiyan commented 4 years ago

Hey, no screenshot was attached (at least that I can see). Would you be able to reupload it?

brianjohnhaas commented 4 years ago
Screen Shot 2020-02-19 at 9 22 02 PM
shamsbhuiyan commented 4 years ago

Ah thanks. I'm a little confused by what 5' partial meant. I assumed it meant that there are inframe amino acids with a stop codon, but no start codon. Is my interpretation correct?

brianjohnhaas commented 4 years ago

yes, that's correct

shamsbhuiyan commented 4 years ago

But there is a start codon in frame within that ORF. So shouldn't it be a complete ORF (and also smaller?) The Transdecoder ORF is 268 AA but the ORF finder ORF is 204 AA.

On Tue, Feb 25, 2020 at 10:47 AM Brian Haas notifications@github.com wrote:

yes, that's correct

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGNNG2V4J4JPL62EWKTREVRTPA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5A3CQ#issuecomment-591007114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5CZGI6BPWY4RTSYVDBSVTREVRTPANCNFSM4KYDBBFA .

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

brianjohnhaas commented 4 years ago

if the orf runs off the end of the contig, we start it at the end

On Tue, Feb 25, 2020 at 1:51 PM shamsbhuiyan notifications@github.com wrote:

But there is a start codon in frame within that ORF. So shouldn't it be a complete ORF (and also smaller?) The Transdecoder ORF is 268 AA but the ORF finder ORF is 204 AA.

On Tue, Feb 25, 2020 at 10:47 AM Brian Haas notifications@github.com wrote:

yes, that's correct

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGNNG2V4J4JPL62EWKTREVRTPA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5A3CQ#issuecomment-591007114 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AC5CZGI6BPWY4RTSYVDBSVTREVRTPANCNFSM4KYDBBFA

.

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=ABZRKXYC6LQKUCQFVBBTDSLREVSCBA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5BJKQ#issuecomment-591008938, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYY3DTGKC26S2GMLELREVSCBANCNFSM4KYDBBFA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

shamsbhuiyan commented 4 years ago

But why?

On Tue, Feb 25, 2020 at 10:59 AM Brian Haas notifications@github.com wrote:

if the orf runs off the end of the contig, we start it at the end

On Tue, Feb 25, 2020 at 1:51 PM shamsbhuiyan notifications@github.com wrote:

But there is a start codon in frame within that ORF. So shouldn't it be a complete ORF (and also smaller?) The Transdecoder ORF is 268 AA but the ORF finder ORF is 204 AA.

On Tue, Feb 25, 2020 at 10:47 AM Brian Haas notifications@github.com wrote:

yes, that's correct

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGNNG2V4J4JPL62EWKTREVRTPA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5A3CQ#issuecomment-591007114

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AC5CZGI6BPWY4RTSYVDBSVTREVRTPANCNFSM4KYDBBFA

.

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=ABZRKXYC6LQKUCQFVBBTDSLREVSCBA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5BJKQ#issuecomment-591008938 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABZRKXYY3DTGKC26S2GMLELREVSCBANCNFSM4KYDBBFA

.

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGPR2MOGGTXHVHCPITTREVS7LA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5CHCY#issuecomment-591012747, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5CZGJHI7UROFS3DXSVWGDREVS7LANCNFSM4KYDBBFA .

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

brianjohnhaas commented 4 years ago

Occam's razor, I guess. It's unlikely for a long upstream region to be free of an intervening stop codon unless you have high GC content. Simplest explanation is that it's a 5' partial. If you had a longer sequence, you'd find the proper start, or a better one.

On Tue, Feb 25, 2020 at 2:08 PM shamsbhuiyan notifications@github.com wrote:

But why?

On Tue, Feb 25, 2020 at 10:59 AM Brian Haas notifications@github.com wrote:

if the orf runs off the end of the contig, we start it at the end

On Tue, Feb 25, 2020 at 1:51 PM shamsbhuiyan notifications@github.com wrote:

But there is a start codon in frame within that ORF. So shouldn't it be a complete ORF (and also smaller?) The Transdecoder ORF is 268 AA but the ORF finder ORF is 204 AA.

On Tue, Feb 25, 2020 at 10:47 AM Brian Haas notifications@github.com wrote:

yes, that's correct

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGNNG2V4J4JPL62EWKTREVRTPA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5A3CQ#issuecomment-591007114

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AC5CZGI6BPWY4RTSYVDBSVTREVRTPANCNFSM4KYDBBFA

.

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=ABZRKXYC6LQKUCQFVBBTDSLREVSCBA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5BJKQ#issuecomment-591008938

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABZRKXYY3DTGKC26S2GMLELREVSCBANCNFSM4KYDBBFA

.

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=AC5CZGPR2MOGGTXHVHCPITTREVS7LA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5CHCY#issuecomment-591012747 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AC5CZGJHI7UROFS3DXSVWGDREVS7LANCNFSM4KYDBBFA

.

-- Shamsuddin Bhuiyan, BHSc

"Oh, so they have internet on computers now!" -- Homer Simpson

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TransDecoder/TransDecoder/issues/100?email_source=notifications&email_token=ABZRKX3YAAWWGQNY5D6YPX3REVUDFA5CNFSM4KYDBBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5DJ7I#issuecomment-591017213, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX2HJFJNYD7K5CQ2CWLREVUDFANCNFSM4KYDBBFA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas