Open shiltemann opened 5 months ago
Hi @shiltemann, these are the FASTA files for brown algae that Romy added to the repo. Recently, I added a 5-letter code to the FASTA headers in a commit change on my branch and submitted a pull request (though we are awaiting confirmation from Romy). You can merge it to the main and hopefully everything will be fine then.
Aha! then ignore everything I said ;p
@Deeptivarshney
Ok, looks like there are only a few remaining minor inconsistencies, perhaps this is intentional/expected/ok, in which case please ignore and close this issue again, but otherwise maybe something to look into:
FASTA files without standard header naming scheme (not beginning with
LETTERCODE_
)>Ec-00_011370.1 Domain of unknown function DUF3449 (209) ;mRNA; f:18728790-18730729
)>Trimin1|9722|CE9721_89
)>ISG|UNPIN3861CG0010|len:279|ori:-|TentativeLG00 type:complete
For optimal functionality from TAPscan website point of view, naming scheme should be
>LETTERCODE_organelle_proteinID
Where the proteinID (or gene ID) is used to search on PLAZA, and the organelle suffix is optional.
It's not a priority, but in theory would be nice to have everything standardized this way