Extra tools needed for structuring genome information

The specs are here.

Representing strains/subspecies: Write a GFF3 preprocessor script to populate the organism table. Load the organism table with default chado list of organisms first. Probably get a dump for amoebozoa and load the entire list.
Representing multiple assemblies: A GFF3 post-processor script that would extract information from GenBank assembly page and load it in chado. The script will also create links between downstream features. It might take the assembly id or taxon id as input, however it needs a little bit of trial and error before settling on the one that works.
Implement versioning: A GFF3 post-processor script that will reorganize the DDB ids, create new sequence ids, add versioning, create history links. The script have to figure out the canonical feature before applying the version no.

dictyBase / Modware-Loader