custom database creation question

Hello I want to build my own database (GTDB + own built MAGs). I used prodigal to convert my nucleotide fasta files to protein fasta files. As I see prodigal assigns as first column of a fasta header the contig name that the assembler outputs. after the first column appears infor regarding prodigal functionality. The issue is that for building a kaiju custom database its necessary to sustitute the protein fasta headers with NCBI protein taxon identifier numbers. Should I do this buildijng my own script to assign the NCBI protein taxon identifiers?

Lets say that one protein fasta header is the following one:

k141_811263_4 # 1653 # 1775 # -1 # ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.228

Here k141_811263_4 corresponds to the genome identifier. the "_4" substring its to the contig number of the draft genome.

the genome k141_811263 has been previously classified by GTDB and there is taxonomic information about the genome (classified by domain, phyla, clase, order, family, genus. species)

So I have to extract that classification info and match it with the NCBI taxon identifier number?

bioinformatics-centre / kaiju

custom database creation question #204