Closed lijing28101 closed 4 years ago
Hi @lijing28101
Some possibilities:
--mode nosplit
when launching mikado pick
(and/or change the value in the configuration file).mikado pick
to use a stringent algorithm for splitting transcripts (--mode stringent
). In this mode, Mikado will split transcripts if and only if the two (or more) sides of the transcript match to different proteins. The other modes (permissive
, lenient
) are more aggressive.I hope this helps. It is a bit puzzling. I would go for the last solution (mode stringent
) as a first port of call.
Hi @lucventurini ,
Thanks for your suggestion. Since I want to identify orphan gene, which may very short. Can I change the cutoff for loading ORFs? (change 250 nt to 150 nt). If two ORFs overlap, how to determine which ORFs will be load? By length or something else? I've tried nosplit mode, most of CDS are complete. But if a transcript have several non-overlap ORFs, how does mikado to determine the CDS?
Thanks, Jing
Hi @lijing28101
Since I want to identify orphan gene, which may very short. Can I change the cutoff for loading ORFs? (change 250 nt to 150 nt).
Yes, it is possible. In the configuration file, under pick.orf_loading
, you can find and modify the following:
minimal_orf_length = 50
minimal_secondary_orf_length = 200 # Apologies, this (200) is the real default value, not 250
If two ORFs overlap, how to determine which ORFs will be load? By length or something else?
The default is by length. The longest ORF will be kept. Tie-breakers are solved looking at whether the ORF is complete or not. Admittedly this is not the most refined method. We are not looking at completeness first because, during the original development, we found many cases where ORF finders would locate for incomplete transcripts a spurious, short internal ORF in one of the possible incorrect frames.
I've tried nosplit mode, most of CDS are complete. But if a transcript have several non-overlap ORFs, how does mikado to determine the CDS?
Mikado will keep all non-overlapping CDSs. However, it will only report the longest in the final output. This behaviour can be changed in pick.output_format
:
report_all_orfs = false # Switch this to True
I have to say that this is indeed a weakness of Mikado - the tool kinda relies on the ORFs provided being correct. We do not do anything clever internally to validate and choose amongst the different options.
So a better way of going about this might be to aid TransDecoder by giving it BLASTP data relative to the ORFs it finds in its LongOrf
step. If you are not aware on how to do it, the TransDecoder
wiki has detailed instructions.
I hope this helps.
Hi @lucventurini , I didn't see pick.orf_loading
and `pick.orf_format' in both configure file and score file. I need add them by myself?
Hi @lucventurini , I didn't see pick.orf_loading and `pick.orf_format' in both configure file and score file. I need add them by myself?
Apologies, I was not very clear on my part. First off: all the fields I mentioned are in the configuration file, not the scoring file. The fields you are looking for are:
pick
orf_loading
under pickminimal_orf_length
minimal_secondary_orf_length
output_format
report_all_orfs
The .
above (e.g. pick.orf_loading
) was to indicate the hierarchical location.
In case the fields are not present, please insert them in the correct location in the configuration file.
Again apologies, I understand this is not as user-friendly as a command line switch. I will consider adding them to the interface of pick
and/or configure
.
Closing for now.
Hi I'm running mikado version 2 (branch 270). I find a problem of my output. Since I got many (about 30%) incomplete CDS from mikado, I checked transdecoder output and only keep transcripts with complete CDSs for mikado serialise and pick. But I still got 25% of transcript with incomplete CDS, and over 50% for mono-exonic transcripts. I compared the structure from mikado and my original structure from stringtie and transdecoder. I found mikado trim some exons and then cause the incomplete CDS.
For example, for 3 prime_partial:
I'm not sure whether this is caused by transcript split during pick. Do you have any idea to avoid that?
Thanks, Jing