EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
94 stars 18 forks source link

Invalid frame #107

Closed olekto closed 6 years ago

olekto commented 7 years ago

Hi, when running mikado serialise, I get this error: ValueError: Invalid frame specified for ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92: 360. Must be None or 0, 1, 2

Is it this line that is the issue? https://github.com/lucventurini/mikado/blob/1365804e263fe47bca857e5b891592c5af1e813b/Mikado/parsers/bed12.py#L281

Used TransDecoder 5.0.1 and mikado installed yesterday.

Thank you.

Please find the bed12 and fasta entries below.

Sincerely, Ole

BED12 entry: STRG.10290.2 0 2039 ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:complete_len:328_(+),score=179.92 0 + 360 1344 0 12039 0

Fasta entry:

>STRG.10290.2
CACACACACACACACACACACACATGGCCGGTTCTAGAGTGAGACACAGCCACTCATACA
CACACATGGCCTGTTCCAGAACATTCATCGCTCTGAGTTGAGCTCCAATATCTTCTCCGC
CCCCAGGGGTACAATGGACATTATTATTTTATTTTATTTGTTCTGTGAAAGGTGTTTTGT
TTGACGGAGTGCAGCATGCCATTGTTCTTTCAATAATTTAGCTGGGATAAATAAATACTC
ATTCATATTTAAAGCCGTTCTGTTGATTTTATTAGCATAAAGGAAAGCTAGAACTAGAGT
ATGAGCCGTAGCAGGGTCGCTACGTATATAAATAAATGTATTTACAGCATCTGAATGGCA
ATAGTGGAGTTCTAGTCGTGGATATCGTATACAGTTTTATATTTGGTTTTGTATGATCTA
TTATGAATCTATTTGTATACAGCCCAAAATAATAGCCTATTTACAATCATCTCCCTTTCA
TTTCGCACATAAATAAACTCTGAACCAAGCTTTCCTCCCCTAAAAAACATCTCTAGTCAT
TATTCATATATTTATTCATAAACACACTTATATCCGATCATCGCCGTGGAGACGCACAGC
GGTAAAGCGTCTCTTCACATGAGTTCAGCCCACGCGCCTTCTCACACGATCCCATCCCCG
GTGACCCCTCACACACACACACACACACACGGGCCCTACTCCGGCTGCCCCATGGGGGTC
ACCACGGGCCAGGGGAGATCCGGTAGCAGCGAGTGGTGGTGGGGCTCGACGGGGACGTGC
TGGGCCAGCAGCAGCTGCTGCTGATGCTGCTGCTGCGCCCAGTTGAAGTCCACGCCGTTC
TTCCTCAGTGCCGCCAGGTGTTTGATGTCCAGCGTGGTGAAGGTGGTGAACTGCACCTGG
TTCCTCTTGCTGGTGGGCGAGTGCAGGGGCTCGTCGCGCTGGGGCCGGGCCAGCAGGGTG
GCGGAGCGCTGCGCCGCGGGCTCCGACGGCGGCTGGCACTGGATCTGCTGCTGGGAGCTG
GAGCGGCCGCGGCCCAGGGTGGCCGTCCTCTGGGGGAAGGCGGAGGTCAGGGCGCCCGTC
TGGGCGTCCATGGAGCGGCGGGAGTCCAGCGAAGGCCGGTGGTGGGCCTCTCGTTGGAGC
GTGCACACTTTGGGCGGCGGGCCCGGGACCTCGCCTCCCATGGTGGTGGTCCCCAGCCAC
ACCCAGTCGTGCTTGTGGTCCTTGGGGTCTCCGGTGGGCACCGGGGCGGCCTGGATGGGC
GTCTTGTGGCCGCGGAAGCACAGGTTGTACGAGGCGCAGTTCACCAGGAAGGCCAGTATG
GCGACGCACGAGACCCCCACCAGGGCAAACATCCCGATCTCCATGTCCGACATGGCCTTG
AACGTCCGGATGATGTCGTTCTCGAACACCTTGGGGGGGGACTTGGACTTGGGGGTCTCC
TTCTCGGGCTCGGCGTCAGCGTTCGGCCCGTCTTTCTTGGGCGGGGCGACGGGGTTGTCC
AGCATGTTGCCGTAGCTCCTCCTCTGTCCGTCCCCCGCCCCCACCGCGCCCATGGGGGCC
TCCAGCCCGGGCCGTGCGGCGGTGCTGCGGGGGGGTGTCGAGGCCGGTCGGCCCGTGGTG
GGGTCGGTGGCGGGGGGTGGTGGTGGTGTGGGGTCGGGGTGCGGTGGAGGTGGTCTCATC
AGGAACAACCGTAGGGAACCTGGCGGGCGGCGTGCCGACCCAAACCTCCTGGGGTCGAAG
CGTCTCCGTGTCGGGGGGGGTGGTGGTGGTGGTGGAGGCCTGGCTCGGCATCGGCCATCT
ACGCCCGTCCCGGTCCTTGGGCGTGTCCCGCGACGTCGACCACACCTCAGGCTCCTCCTC
CTCCTCCTCCTCCTCCTCCTCCTCCTCGTCCGGCCCCTCCCCGAAGCGGTCGTCGCCGCG
GCCGGCCCCCCGCAGGTCCGTCCTCAGCGCCCCGGTGCCCACCGCCAGCTTGCTCTTCCT
CTTGGACTTCTGGCACTCCTCGCAGATCCGCAGCTCCACCCGCACCAGGGGCCCCGGGT
lucventurini commented 6 years ago

Dear Ole, Sorry for the late reply but I was on holiday. The problem stems from the fact that there is no valid ATG at the position indicated by TransDecoder (360), as the codon there is “ATA”. Are you working with a species with a non-standard genetic code? Unfortunately I have not got around to make Mikado support such cases yet, although it is on my TODO list. For the moment, a quick fix would be to modify the “max_regression” value to a large percentage (expressed as a value between 0 and 1, e.g. 0.3 or 0.5) in the configuration file, under the “serialise” section. This will instruct Mikado serialise to look for the first ATG codon within the coding part signalled by TransDecoder. In your specific example, this will move the translation start from position 360 to position 618 (the first available in-frame ATG start codon)

Kind regards

Luca Venturini, PhD Computational Biologist

[cid:image002.png@01D1CE42.9C13ED10] Earlham Institute Norwich Research Park Norwich Norfolk NR4 7UZ +44 1603 450 190 Luca.Venturini@earlham.ac.ukmailto:Luca.Venturini@earlham.ac.uk www.earlham.ac.ukhttp://www.earlham.ac.uk/

From: Ole Kristian Tørresen [mailto:notifications@github.com] Sent: 20 September 2017 12:42 To: lucventurini/mikado mikado@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [lucventurini/mikado] Invalid frame (#107)

Hi, when running mikado serialise, I get this error: ValueError: Invalid frame specified for ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92: 360. Must be None or 0, 1, 2

Is it this line that is the issue? https://github.com/lucventurini/mikado/blob/1365804e263fe47bca857e5b891592c5af1e813b/Mikado/parsers/bed12.py#L281

Thank you.

Please find the bed12 and fasta entries below.

Sincerely, Ole

BED12 entry: STRG.10290.2 0 2039 ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92 0 + 360 1344 0 12039 0

Fasta entry:

STRG.10290.2

CACACACACACACACACACACACATGGCCGGTTCTAGAGTGAGACACAGCCACTCATACA

CACACATGGCCTGTTCCAGAACATTCATCGCTCTGAGTTGAGCTCCAATATCTTCTCCGC

CCCCAGGGGTACAATGGACATTATTATTTTATTTTATTTGTTCTGTGAAAGGTGTTTTGT

TTGACGGAGTGCAGCATGCCATTGTTCTTTCAATAATTTAGCTGGGATAAATAAATACTC

ATTCATATTTAAAGCCGTTCTGTTGATTTTATTAGCATAAAGGAAAGCTAGAACTAGAGT

ATGAGCCGTAGCAGGGTCGCTACGTATATAAATAAATGTATTTACAGCATCTGAATGGCA

ATAGTGGAGTTCTAGTCGTGGATATCGTATACAGTTTTATATTTGGTTTTGTATGATCTA

TTATGAATCTATTTGTATACAGCCCAAAATAATAGCCTATTTACAATCATCTCCCTTTCA

TTTCGCACATAAATAAACTCTGAACCAAGCTTTCCTCCCCTAAAAAACATCTCTAGTCAT

TATTCATATATTTATTCATAAACACACTTATATCCGATCATCGCCGTGGAGACGCACAGC

GGTAAAGCGTCTCTTCACATGAGTTCAGCCCACGCGCCTTCTCACACGATCCCATCCCCG

GTGACCCCTCACACACACACACACACACACGGGCCCTACTCCGGCTGCCCCATGGGGGTC

ACCACGGGCCAGGGGAGATCCGGTAGCAGCGAGTGGTGGTGGGGCTCGACGGGGACGTGC

TGGGCCAGCAGCAGCTGCTGCTGATGCTGCTGCTGCGCCCAGTTGAAGTCCACGCCGTTC

TTCCTCAGTGCCGCCAGGTGTTTGATGTCCAGCGTGGTGAAGGTGGTGAACTGCACCTGG

TTCCTCTTGCTGGTGGGCGAGTGCAGGGGCTCGTCGCGCTGGGGCCGGGCCAGCAGGGTG

GCGGAGCGCTGCGCCGCGGGCTCCGACGGCGGCTGGCACTGGATCTGCTGCTGGGAGCTG

GAGCGGCCGCGGCCCAGGGTGGCCGTCCTCTGGGGGAAGGCGGAGGTCAGGGCGCCCGTC

TGGGCGTCCATGGAGCGGCGGGAGTCCAGCGAAGGCCGGTGGTGGGCCTCTCGTTGGAGC

GTGCACACTTTGGGCGGCGGGCCCGGGACCTCGCCTCCCATGGTGGTGGTCCCCAGCCAC

ACCCAGTCGTGCTTGTGGTCCTTGGGGTCTCCGGTGGGCACCGGGGCGGCCTGGATGGGC

GTCTTGTGGCCGCGGAAGCACAGGTTGTACGAGGCGCAGTTCACCAGGAAGGCCAGTATG

GCGACGCACGAGACCCCCACCAGGGCAAACATCCCGATCTCCATGTCCGACATGGCCTTG

AACGTCCGGATGATGTCGTTCTCGAACACCTTGGGGGGGGACTTGGACTTGGGGGTCTCC

TTCTCGGGCTCGGCGTCAGCGTTCGGCCCGTCTTTCTTGGGCGGGGCGACGGGGTTGTCC

AGCATGTTGCCGTAGCTCCTCCTCTGTCCGTCCCCCGCCCCCACCGCGCCCATGGGGGCC

TCCAGCCCGGGCCGTGCGGCGGTGCTGCGGGGGGGTGTCGAGGCCGGTCGGCCCGTGGTG

GGGTCGGTGGCGGGGGGTGGTGGTGGTGTGGGGTCGGGGTGCGGTGGAGGTGGTCTCATC

AGGAACAACCGTAGGGAACCTGGCGGGCGGCGTGCCGACCCAAACCTCCTGGGGTCGAAG

CGTCTCCGTGTCGGGGGGGGTGGTGGTGGTGGTGGAGGCCTGGCTCGGCATCGGCCATCT

ACGCCCGTCCCGGTCCTTGGGCGTGTCCCGCGACGTCGACCACACCTCAGGCTCCTCCTC

CTCCTCCTCCTCCTCCTCCTCCTCCTCGTCCGGCCCCTCCCCGAAGCGGTCGTCGCCGCG

GCCGGCCCCCCGCAGGTCCGTCCTCAGCGCCCCGGTGCCCACCGCCAGCTTGCTCTTCCT

CTTGGACTTCTGGCACTCCTCGCAGATCCGCAGCTCCACCCGCACCAGGGGCCCCGGGT

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/lucventurini/mikado/issues/107, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIfFHWycCOJfPjxAZR6ehNAP5vvZRyRWks5skPodgaJpZM4PdvZC.

olekto commented 6 years ago

Dear Luca, this is a fish, so shouldn't be anything special, besides some general fishiness.

I'll modify the configuration file as you said, and try again.

Thank you.

Ole

lucventurini commented 6 years ago

Dear Ole, good to know. This looks likely to be a bug in TransDecoder then. The "max_regression" value should take care of this case (and others which would crop up in the BED file). Let me know how it goes!

Cheers

Luca

olekto commented 6 years ago

Dear Luca.

I used TransDecoder 5.0.1. There seems to have been some changes to how it scores ORFs lately (https://github.com/TransDecoder/TransDecoder/releases) but I can't see anything explicit around ATA starting sites.

Ole

olekto commented 6 years ago

Dear Luca.

After going through what I did, I see I might have messed up. I ran TransDecoder on the input fasta file (after converting using cufflinks_gtf_genome_to_cdna_fasta.pl). I'm restarting on the mikado_prepared.fasta file now, and hopefully that will work fine.

lucventurini commented 6 years ago

Dear Ole, That would definitely create problems. Please let me know how it goes after running TransDecoder on the mikado_prepared.fasta file!

Cheers

Luca