Closed Juke34 closed 4 years ago
CDS. SNAP doesn't really model the non-coding parts.
On Mar 17, 2020, at 3:17 AM, Jacques Dainat notifications@github.com wrote:
On the common protocol to train snap is through MAKER annotation pipeline. The provide a script called maker2zff. Looking at their script I realise that instead to use the exons coordinates they use the CDS coordinates. What would be your recommendation to better train snap? using CDS or exons?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Does the separator field matter in the zff file? Should it be space or tabulation?
I don't think it matters, but tab always looks better.
On Mar 17, 2020, at 8:56 AM, Jacques Dainat notifications@github.com wrote:
Does the separator field matter in the zff file? Should it be space or tabulation?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
A last remark, I think you don't mention in the readme that genome.dna and genome.ann must be sorted by sequence identifier in the same order. I did a first try where my files were not sorted in the same order and got plenty of error messages. Now I sorted them in the same way everything goes fine
It processes them one at a time. Back when SNAP was first developed, there was no way you could all the chromosomes and annotation in at once.
On Mar 17, 2020, at 9:06 AM, Jacques Dainat notifications@github.com wrote:
A last remark, I think you don't mention in the readme that genome.dna and genome.ann must be sorted by sequence identifier in the same order. I did a first try where my files were not sorted in the same order and got plenty of error messages.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
On the common protocol to train snap is through MAKER annotation pipeline. They provide a script called
maker2zff
. Looking at their script I realise that they use only the CDS coordinates to create Esngl, Einit, Eterm, Exon, zff features. What would be your recommendation to better train snap? Using CDS only is enough? Can we use exons only? I checkedzoeFeature.h
, what about the other features?Would I get a better training if I provide a zff file with Intron, UTR5, UTR3, Acceptor, Donor, Start, Stop, etc features? Maybe most of them are compute automatically while training (i.e. start, stop, Acceptor, Donor can be deduced by exon coordinates... )
maker2zff
defines Esngl, Einit, Eterm, Exon zff features based on CDS gff features, would I get a better training if I define Esngl, Einit, Eterm, Exon based on Exon gff feature and addCoding
zff feature to specify which part of the exon is coding?