NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Create a way to shorten/lengthen headers for fasta files #19

Closed knaegle closed 11 months ago

knaegle commented 11 months ago

Is your feature request related to a problem? Please describe. Currently, our fasta headers are complete with accession|gene_name|domain_name|domain_number|IPR_ID|start|end Although this is highly informative, it produces long headers that can make things difficult in downstream processes (like Jalview) or lead to failures for some operations that shorten headers automatically (like promals3d).

Describe the solution you'd like I think we should create a flexible way to take CoDIAC fasta production and shorten the header in a way that is determine by the user, keeping a dataframe like translation for going back to long headers (or at least for keeping as documentation for the full information of what a fast sequence refers to.

Describe alternatives you've considered We had already changed the header ordering to account for truncation by promals3d with a translation back to be compatible with all other features that were being produced on original headers. This worked, up until we moved to a domain with a longer name, and now the domain number is not part of the code.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.