EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
97 stars 18 forks source link

question about mikado pick #293

Closed lijing28101 closed 4 years ago

lijing28101 commented 4 years ago

Hi, I have a question about the mikado pick algorithm. When I run transdecoder on mikado_prepared.fasta, I noticed that transdecoder will keep the longest CDS and delete the short one within long CDS. I'm curious that whether mikado also keep the long CDS, if I manually keep both CDS as input of ORF? Since I'm interested on orphan genes, and find some orphan genes just within the known gene but much shorter. If the CDS have totally different protein homolog as the longer one, I still want to keep it.

lucventurini commented 4 years ago

Hi, I have a question about the mikado pick algorithm. When I run transdecoder on mikado_prepared.fasta, I noticed that transdecoder will keep the longest CDS and delete the short one within long CDS. I'm curious that whether mikado also keep the long CDS, if I manually keep both CDS as input of ORF? Since I'm interested on orphan genes, and find some orphan genes just within the known gene but much shorter. If the CDS have totally different protein homolog as the longer one, I still want to keep it.

Dear @lijing28101

I'm curious that whether mikado also keep the long CDS, if I manually keep both CDS as input of ORF? Since I'm interested on orphan genes, and find some orphan genes just within the known gene but much shorter. If the CDS have totally different protein homolog as the longer one, I still want to keep it.

Mikado will consider as potentially valid all ORFs found for a transcript, as long as they are not overlapping each other. So if e.g. you had a long ORF in the middle (complete), a shorter ORF upstream or downstream and non overlapping the first ORF, and a third ORF which is overlapping either, Mikado will in general load into the transcript the first and the second, but not the third. The only exception is when the second ORF is very short (default shorter than 250bps).

Further details: https://mikado.readthedocs.io/en/latest/Algorithms.html

What Mikado does with the ORFs is determined by the mikado pick mode. My understanding is that probably you want to run Mikado in either split or permissive mode, to consider each ORF as a separate transcript. See here for further details: https://mikado.readthedocs.io/en/latest/Usage/Configure.html#chimera-splitting

I hope this helps.

lucventurini commented 4 years ago

Closing for now due to lack of activity.