Is your feature request related to a problem? Please describe.
Currently, our fasta headers are complete with
accession|gene_name|domain_name|domain_number|IPR_ID|start|end
Although this is highly informative, it produces long headers that can make things difficult in downstream processes (like Jalview) or lead to failures for some operations that shorten headers automatically (like promals3d).
Describe the solution you'd like
I think we should create a flexible way to take CoDIAC fasta production and shorten the header in a way that is determine by the user, keeping a dataframe like translation for going back to long headers (or at least for keeping as documentation for the full information of what a fast sequence refers to.
Describe alternatives you've considered
We had already changed the header ordering to account for truncation by promals3d with a translation back to be compatible with all other features that were being produced on original headers. This worked, up until we moved to a domain with a longer name, and now the domain number is not part of the code.
Tasks
Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.
[x] Create code to shorten, taking arguments for ordering and testing if unique headers were created or not.
[x] Write the code to translate back and forth from long to short names in a fasta file (e.g. where you want to replace short names in a post-aligned process)
Is your feature request related to a problem? Please describe. Currently, our fasta headers are complete with accession|gene_name|domain_name|domain_number|IPR_ID|start|end Although this is highly informative, it produces long headers that can make things difficult in downstream processes (like Jalview) or lead to failures for some operations that shorten headers automatically (like promals3d).
Describe the solution you'd like I think we should create a flexible way to take CoDIAC fasta production and shorten the header in a way that is determine by the user, keeping a dataframe like translation for going back to long headers (or at least for keeping as documentation for the full information of what a fast sequence refers to.
Describe alternatives you've considered We had already changed the header ordering to account for truncation by promals3d with a translation back to be compatible with all other features that were being produced on original headers. This worked, up until we moved to a domain with a longer name, and now the domain number is not part of the code.
Tasks
Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.