Deal with multiple CDS IDs for the same transcript

in the gstf_preparation tool.

Biologically, a single mRNA can lead to different CDSs (and therefore protein translations) due to alternative translational start sites. This is in fact allowed in the GFF3 standard: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md (look for "alternative translational start sites"). If a CDS is discountinuous, its fragments must use the same ID, so the ID can be used to group the fragments composing the various alternative CDSs.

Ensembl seem to enforce the "one CDS per transcript" rule in its databases, but we don't have to.

Additional problem: same GFF3 files (e.g. the one in the gstf_preparation tool help!) use different IDs for fragments of the same CDS, which I think is non-standard.

TGAC / earlham-galaxytools

Deal with multiple CDS IDs for the same transcript #120