galaxy-genome-annotation / gff-survey

1 stars 0 forks source link

Planning #1

Open hexylena opened 1 year ago

hexylena commented 1 year ago

Given previous conversations and statements like:

btw CDS vs exon came up in my discussion with Ian Korf: genestats assumes mRNA -> exon, but it would be nice to be able to do stats of mRNA -> CDS... putting exon entries in that correspond with the CDSes seems wrong to me because CDS can be part of an exon. I'm a bit undecided but this might be a TODO item for genestats (which could do with a version update anyway) so what I have learned thus far: mRNA -> CDS messes up the jcvi annotation stats tool. Arguably when aligning protein -> DNA all you have is CDS - i.e. mRNA -> exon isn't strictly correct. But annotation stats expects exons. I'm currently running a test sample through Maker2 to try and figure out what might be causing the Train SNAP tool to come out with a bad HMM... the thing here being that Maker2 annotation works as input (at least in the Eukaryotic genome annotation tutorial) but my current annotation doesn't. what I've got at the moment is gene -> mRNA -> cds Maker produces gene -> mRNA -> exon | CDS | five_prime_UTR | three_prime_UTR yeah I've always seen CDS in uppercase, I guess it's the "standard" (if there's any in the gff world)

It would be interesting for us, the GGA group, to collectively do a survey of what real world GFF3 files look like.

Plan

abretaud commented 1 year ago

I like this idea! Maybe there's some overlap with https://github.com/NAL-i5K/AgBioData_GFF3_recommendation? Never really had the time to explore what they did

hexylena commented 1 year ago

Yeah great one! I'll start adding URLs to stuff into this repo, and whenever we get around to it we'll just pull that entire list for analysis in galaxy.

hexylena commented 1 year ago

Added every GFF3 file I could find on my laptop (sanitized for owner= email addresses that were in apollo.)

hexylena commented 1 year ago

https://github.com/galaxy-genome-annotation/gff-survey/blob/main/stats.md

hexylena commented 1 year ago

https://github.com/galaxy-genome-annotation/gff-survey/blob/main/stats.md#scores :sob: