NCBI-Hackathons / Master_gff3_parser

Convert sequence IDs between ucsc/refseq/genbank
MIT License
16 stars 5 forks source link

Guessing the ID source in Target fields #4

Open childers opened 7 years ago

childers commented 7 years ago

@guilhemfaure, from our discussion

Alignment gffs can have a Target field containing an ID. If we support updating the ID, we need to rerun the guessing workflow to try and determine what the ID type is and what reformatting options are available.

There is no expectation that the Target IDs are going to be the same assembly as the main IDs we are updating. That would only be the case if the sequence was aligned to itself, which is a valid use-case, but less common than mapping the assembly to some other assembly.

Should this be a separate option specified when the program is run? (update target IDs vs update source IDs) Should this be done in parallel as the source IDs are updated?

childers commented 7 years ago

From what I can remember most or all of the alignment gff I had worked with in the past were mapped to EST or gene sets, and the Target=ID pointed to the sequence ID for a gene or transcript. I think I might have used genome assembly to genome assembly mapped gff3 once as a special case.