Open monicacecilia opened 9 years ago
@nathandunn you or @deepakunni3 ? I assigned this one to you to bring it back to the spotlight. cheers,
Should this be the default for exporting GFF3 . . or should this be another option?
For now, I think it should be another option called "GFF3 for NCBI" or something similar. We may want to incorporate this permanently later on, but I don't know how many people are using the output on their pipelines, so we should make an announcement before changing it for good.
This is interesting. Maybe I can look into this.
@deepakunni3 Sure. @monicacecilia Okay . an option makes the most sense for now.
Si @deepakunni3! cheers,
:+1:
These are the GFF3 formatting requirements provided by Terence Murphy from NCBI. Before submitting the official gene set (OGS), that is, the integrated GFF of predicted and manually curated models, some attributes need to be added:
locus_tag
attribute to top-level features such as gene or pseudogene (e.g., locus_tag=W904_OFAS000001; where W904 is the species accession number used in the NCBI submission system). Thelocus_tag
prefix is generated when a BioProject is created, as shown here: http://www.ncbi.nlm.nih.gov/bioproject/230921transcript_id
andprotein_id
attributes to both mRNA and CDS features (e.g., transcript_id=OFAS000001-RA;protein_id=OFAS000001-PA). Note: add onlytranscript_id
to transcripts that are not from coding genes (e.g.,pseudogenic_transcript
, rRNA)product
attribute to CDS features (e.g.,product=prophenoloxidase
); this is usually the mRNA name when the name is different from ID.Adapted from email sent by Mei-Ju Chen at USDA/NAL. Mei-Ju's request: "It will be great if WA team could help to batch processing some of the attributes. Let me know if you have questions."