TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
138 stars 20 forks source link

GFF problems (with Geneious) + feature suggestion #16

Closed JWDebler closed 2 years ago

JWDebler commented 2 years ago

Hi, I like this tool. I am currently using something a former colleague created a few years ago panTE. It is very exhaustive but suffers from some of the problems mentioned in your preprint, mainly overlapping features. It is also not being developed anymore.

I just ran into a problem with the GFF file produced by EarlGrey, it does not work when imported into Geneious. First Geneious complains about the 'NA' in colum 8 (phase of feature). This should be either 0, 1 or 2 for CDS features or '.' for everything else. After fixing that Geneious imports the file fine, but seems to connect all features that have the same (non unique) "ID" tag. image

AlKewell_ctg_01 RepeatMasker    Unknown 859 1137    1636    -   .   Tstart=903;Tend=1159;ID=RND-1_FAMILY-96;shortTE=F
AlKewell_ctg_01 RepeatMasker    Unknown 6213    6877    1979    -   .   Tstart=771;Tend=1440;ID=RND-4_FAMILY-443;shortTE=F
AlKewell_ctg_01 RepeatMasker    Unknown 7789    9093    8533    -   .   Tstart=868;Tend=2242;ID=RND-1_FAMILY-32;shortTE=F
AlKewell_ctg_01 RepeatMasker    Unknown 9095    9331    1007    -   .   Tstart=718;Tend=953;ID=RND-1_FAMILY-103;shortTE=F
AlKewell_ctg_01 RepeatMasker    Unknown 9536    10383   1843    -   .   Tstart=91;Tend=729;ID=RND-1_FAMILY-103;shortTE=F
AlKewell_ctg_01 RepeatMasker    LTR/Gypsy   10481   12295   10599   -   .   Tstart=6031;Tend=7846;ID=RND-4_FAMILY-173;shortTE=F
AlKewell_ctg_01 RepeatMasker    LTR/Gypsy   12513   16028   21611   -   .   Tstart=2497;Tend=6035;ID=RND-4_FAMILY-173;shortTE=F
AlKewell_ctg_01 RepeatMasker    DNA/hAT-Ac  16234   16507   1361    -   .   Tstart=6895;Tend=7199;ID=RND-1_FAMILY-1;shortTE=F
AlKewell_ctg_01 RepeatMasker    LTR/Gypsy   17463   19109   10105   -   .   Tstart=860;Tend=2505;ID=RND-4_FAMILY-173;shortTE=F
AlKewell_ctg_01 RepeatMasker    LTR/Gypsy   19937   20871   3210    -   .   Tstart=755;Tend=1649;ID=RND-1_FAMILY-9;shortTE=F
AlKewell_ctg_01 RepeatMasker    LTR/Gypsy   21089   21755   1529    +   .   Tstart=7174;Tend=7847;ID=RND-4_FAMILY-173;shortTE=F

As an imporovement I'd love to see a GFF file that is ready to run through NCBI's table2asn converter to produce annotations that can be submitted.

Thanks for this work! Cheers, Johannes

TobyBaril commented 2 years ago

Thanks for this - I have added a check in the final GFF parsing before it is written to check for NAs.

I will have a look into table2asn requirements and add it as a future module to Earl Grey.

Thanks for checking out Earl Grey!

TobyBaril commented 2 years ago

added to future features list