Closed tmcgowan closed 5 years ago
The header is attempted parsed as the type indicated by the header content, which for the header in question is assumed to be from SwissProt. But given that the altered header does not follow the proper rules of SwissProt headers the parsing breaks down.
Therefore, when using custom headers, or when making changes to standard headers, we recommend using our non-standard header format instead: https://github.com/compomics/searchgui/wiki/DatabaseHelp#non-standard-fasta. This will ensure that custom headers are parsed correctly.
Thanks.
In SearchGUI, we have a user appending a source tag -- sihumi_ -- in their FASTA header line:
>sp|sihumi_Q8AAB1| GLMS_BACTN Glutamine--fructose-6-phosphate aminotransferase [isomerizing] OS=Bacteroides thetaiotaomicron (strain ATCC 29148 / DSM 2079 / NCTC 10582 / E50 / VPI-5482) GN=glmS PE=3 SV=2
The header is getting parsed at this point in com.compomics.util.protein.Header
LINE 711
At
result.iAccession = aFASTAHeader.substring(0, aFASTAHeader.indexOf("|")).trim();
the accession is parsed as 'sp' and this generates accession duplication error since there are multiple 'sp' accessions.The header line does not get caught where, I think, it should:
At the moment, I am having the user remove the
sihumi_
or replace the '_' with '-'. In each case, the header is processed at the final else. Still not in the 'sp' section.