Closed olekto closed 6 years ago
Dear Ole, Sorry for the late reply but I was on holiday. The problem stems from the fact that there is no valid ATG at the position indicated by TransDecoder (360), as the codon there is “ATA”. Are you working with a species with a non-standard genetic code? Unfortunately I have not got around to make Mikado support such cases yet, although it is on my TODO list. For the moment, a quick fix would be to modify the “max_regression” value to a large percentage (expressed as a value between 0 and 1, e.g. 0.3 or 0.5) in the configuration file, under the “serialise” section. This will instruct Mikado serialise to look for the first ATG codon within the coding part signalled by TransDecoder. In your specific example, this will move the translation start from position 360 to position 618 (the first available in-frame ATG start codon)
Kind regards
Luca Venturini, PhD Computational Biologist
[cid:image002.png@01D1CE42.9C13ED10] Earlham Institute Norwich Research Park Norwich Norfolk NR4 7UZ +44 1603 450 190 Luca.Venturini@earlham.ac.ukmailto:Luca.Venturini@earlham.ac.uk www.earlham.ac.ukhttp://www.earlham.ac.uk/
From: Ole Kristian Tørresen [mailto:notifications@github.com] Sent: 20 September 2017 12:42 To: lucventurini/mikado mikado@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [lucventurini/mikado] Invalid frame (#107)
Hi, when running mikado serialise, I get this error: ValueError: Invalid frame specified for ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92: 360. Must be None or 0, 1, 2
Is it this line that is the issue? https://github.com/lucventurini/mikado/blob/1365804e263fe47bca857e5b891592c5af1e813b/Mikado/parsers/bed12.py#L281
Thank you.
Please find the bed12 and fasta entries below.
Sincerely, Ole
BED12 entry: STRG.10290.2 0 2039 ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92 0 + 360 1344 0 12039 0
Fasta entry:
STRG.10290.2
CACACACACACACACACACACACATGGCCGGTTCTAGAGTGAGACACAGCCACTCATACA
CACACATGGCCTGTTCCAGAACATTCATCGCTCTGAGTTGAGCTCCAATATCTTCTCCGC
CCCCAGGGGTACAATGGACATTATTATTTTATTTTATTTGTTCTGTGAAAGGTGTTTTGT
TTGACGGAGTGCAGCATGCCATTGTTCTTTCAATAATTTAGCTGGGATAAATAAATACTC
ATTCATATTTAAAGCCGTTCTGTTGATTTTATTAGCATAAAGGAAAGCTAGAACTAGAGT
ATGAGCCGTAGCAGGGTCGCTACGTATATAAATAAATGTATTTACAGCATCTGAATGGCA
ATAGTGGAGTTCTAGTCGTGGATATCGTATACAGTTTTATATTTGGTTTTGTATGATCTA
TTATGAATCTATTTGTATACAGCCCAAAATAATAGCCTATTTACAATCATCTCCCTTTCA
TTTCGCACATAAATAAACTCTGAACCAAGCTTTCCTCCCCTAAAAAACATCTCTAGTCAT
TATTCATATATTTATTCATAAACACACTTATATCCGATCATCGCCGTGGAGACGCACAGC
GGTAAAGCGTCTCTTCACATGAGTTCAGCCCACGCGCCTTCTCACACGATCCCATCCCCG
GTGACCCCTCACACACACACACACACACACGGGCCCTACTCCGGCTGCCCCATGGGGGTC
ACCACGGGCCAGGGGAGATCCGGTAGCAGCGAGTGGTGGTGGGGCTCGACGGGGACGTGC
TGGGCCAGCAGCAGCTGCTGCTGATGCTGCTGCTGCGCCCAGTTGAAGTCCACGCCGTTC
TTCCTCAGTGCCGCCAGGTGTTTGATGTCCAGCGTGGTGAAGGTGGTGAACTGCACCTGG
TTCCTCTTGCTGGTGGGCGAGTGCAGGGGCTCGTCGCGCTGGGGCCGGGCCAGCAGGGTG
GCGGAGCGCTGCGCCGCGGGCTCCGACGGCGGCTGGCACTGGATCTGCTGCTGGGAGCTG
GAGCGGCCGCGGCCCAGGGTGGCCGTCCTCTGGGGGAAGGCGGAGGTCAGGGCGCCCGTC
TGGGCGTCCATGGAGCGGCGGGAGTCCAGCGAAGGCCGGTGGTGGGCCTCTCGTTGGAGC
GTGCACACTTTGGGCGGCGGGCCCGGGACCTCGCCTCCCATGGTGGTGGTCCCCAGCCAC
ACCCAGTCGTGCTTGTGGTCCTTGGGGTCTCCGGTGGGCACCGGGGCGGCCTGGATGGGC
GTCTTGTGGCCGCGGAAGCACAGGTTGTACGAGGCGCAGTTCACCAGGAAGGCCAGTATG
GCGACGCACGAGACCCCCACCAGGGCAAACATCCCGATCTCCATGTCCGACATGGCCTTG
AACGTCCGGATGATGTCGTTCTCGAACACCTTGGGGGGGGACTTGGACTTGGGGGTCTCC
TTCTCGGGCTCGGCGTCAGCGTTCGGCCCGTCTTTCTTGGGCGGGGCGACGGGGTTGTCC
AGCATGTTGCCGTAGCTCCTCCTCTGTCCGTCCCCCGCCCCCACCGCGCCCATGGGGGCC
TCCAGCCCGGGCCGTGCGGCGGTGCTGCGGGGGGGTGTCGAGGCCGGTCGGCCCGTGGTG
GGGTCGGTGGCGGGGGGTGGTGGTGGTGTGGGGTCGGGGTGCGGTGGAGGTGGTCTCATC
AGGAACAACCGTAGGGAACCTGGCGGGCGGCGTGCCGACCCAAACCTCCTGGGGTCGAAG
CGTCTCCGTGTCGGGGGGGGTGGTGGTGGTGGTGGAGGCCTGGCTCGGCATCGGCCATCT
ACGCCCGTCCCGGTCCTTGGGCGTGTCCCGCGACGTCGACCACACCTCAGGCTCCTCCTC
CTCCTCCTCCTCCTCCTCCTCCTCCTCGTCCGGCCCCTCCCCGAAGCGGTCGTCGCCGCG
GCCGGCCCCCCGCAGGTCCGTCCTCAGCGCCCCGGTGCCCACCGCCAGCTTGCTCTTCCT
CTTGGACTTCTGGCACTCCTCGCAGATCCGCAGCTCCACCCGCACCAGGGGCCCCGGGT
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/lucventurini/mikado/issues/107, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIfFHWycCOJfPjxAZR6ehNAP5vvZRyRWks5skPodgaJpZM4PdvZC.
Dear Luca, this is a fish, so shouldn't be anything special, besides some general fishiness.
I'll modify the configuration file as you said, and try again.
Thank you.
Ole
Dear Ole, good to know. This looks likely to be a bug in TransDecoder then. The "max_regression" value should take care of this case (and others which would crop up in the BED file). Let me know how it goes!
Cheers
Luca
Dear Luca.
I used TransDecoder 5.0.1. There seems to have been some changes to how it scores ORFs lately (https://github.com/TransDecoder/TransDecoder/releases) but I can't see anything explicit around ATA starting sites.
Ole
Dear Luca.
After going through what I did, I see I might have messed up. I ran TransDecoder on the input fasta file (after converting using cufflinks_gtf_genome_to_cdna_fasta.pl). I'm restarting on the mikado_prepared.fasta file now, and hopefully that will work fine.
Dear Ole, That would definitely create problems. Please let me know how it goes after running TransDecoder on the mikado_prepared.fasta file!
Cheers
Luca
Hi, when running mikado serialise, I get this error: ValueError: Invalid frame specified for ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:completelen:328(+),score=179.92: 360. Must be None or 0, 1, 2
Is it this line that is the issue? https://github.com/lucventurini/mikado/blob/1365804e263fe47bca857e5b891592c5af1e813b/Mikado/parsers/bed12.py#L281
Used TransDecoder 5.0.1 and mikado installed yesterday.
Thank you.
Please find the bed12 and fasta entries below.
Sincerely, Ole
BED12 entry:
STRG.10290.2 0 2039 ID=Gene.60462::STRG.10290.2::g.60462::m.60462;Gene.60462::STRG.10290.2::g.60462;ORF_type:complete_len:328_(+),score=179.92 0 + 360 1344 0 12039 0
Fasta entry: