NAL-i5K / AgBioData_GFF3_recommendation

The AgBioData GFF3 working group has developed recommendations to solve common problems in the GFF3 format. We suggest improvements for each of the GFF3 fields, as well as the special cases of modeling functional annotations, and standard protein-coding genes. We welcome discussion of these recommendations from the larger community.
Creative Commons Zero v1.0 Universal
5 stars 4 forks source link
bioinformatics genomics gff gff3

AgBioData GFF3 working group recommendations

Members

The GFF3 format is a common, flexible tab-delimited format representing the structure and function of genes or other mapped features. However, with increasing re-use of annotation data, this flexibility has become an obstacle for standardized downstream processing. Common software packages that export annotations in GFF3 format model the same data and metadata in different notations, which puts the burden on end-users to interpret the data model.

The AgBioData consortium is a group of genomics, genetics and breeding databases and partners working towards shared practices and standards. Providing concrete guidelines for generating GFF3, and creating a standard representation of the most common biological data types would provide a major increase in efficiency for AgBioData databases and the genomics research community that use the GFF3 format in their daily operations.

The AgBioData GFF3 working group has developed recommendations to solve common problems in the GFF3 format. We suggest improvements for each of the GFF3 fields, as well as the special cases of modeling functional annotations, and standard protein-coding genes. We welcome discussion of these recommendations from the larger community.

References

  1. Sequence Ontology gff3 specifications
  2. AgBiodata consortium
  3. GFF3 recommendation google doc (comment access)
  4. NCBI Genbank genomes gff3 specifications
  5. NCBI Genbank gff3 documentation

GFF3 working group goals

Ultimate goal - to use a GFF3 file from any software or any database in downstream processing tools or applications (e.g. VEP, Tripal, Apollo, ...) WITHOUT having to modify it

1.Databases and software export their GFF3 files in (a) standard way(s)

2.Databases and software know how to import standard information from a GFF3

Goals


Summary of Recommendations

For each column and reserved attribute, we provide the following results from our discussions:

Types of changes that we recommend:

We primarily have recommendations on how to

Detailed Recommendations

Feedback: Please file issues and tag the relevant contact person.


Acknowledgements

We thank Margaret Woodhouse for the inspiration for this working group, and many other contributors from the larger research community who have provided feedback on earlier versions