Closed marina-manrique closed 12 years ago
Ok, I'm on it ! Just one thing, would you mind if the contig ids had a syntax like: ECO1, ECO2, .... ECO1010.... ECOXXX instead of writing all those ugly zeros?
With ugly zeros, it is the usual form (sorry)
Ok... :P
I just committed the changes for all this, you can have a look at FixFastaHeaders new program (it has its own jar file and it's been incorporated to BG7 jar file)
How must be the part of the executions.xml file of this program?
You can check the parameters for this program in the wiki:
I just implemented the corresponding quality control program for 'FixFastaHeaders'. You can find more information in the wiki. So I'm closing this issue now that all this has been implemented/solved ;)
Manual Quality control done.
I've checked the following things (and everything was OK)
@pablopareja It would be good to format the input genome FASTA file so the FASTA file to annotate has always the same header structure, @rtobes and I have decided this could be an appropriate header
where CONTIG_ID is the ProjectName+6 chars number, for example ECO000001 ECO000002
Doing this way you could always get the contig ID splitting by '|' and getting the fist token.
Besides the formatted FASTA file it'd be good to have a tsv file with the CONTIG_ID and the corresponding former header