biocore / microprot

structural annotation pipeline for microbial genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

add match identifier to `match` files #65

Closed tkosciol closed 7 years ago

tkosciol commented 7 years ago

match files so far contain genome_gene_SeqFrom-SeqTo information followed by remaining Prodigal header. Updated version should have the header in the form:

genome_gene_SeqFrom-SeqTo match_identifier remaining Prodigal header

for example: now:

>NZ_G_6668_O9.1_2_209-248 # 78 # 2285 # 1 # Ipe=TTG;rbs_motif=AGxAGG/AGGxGG;rbr=5-10bp;gc_cont=0.49

should become:

>NZ_G_6668_O9.1_2_209-248 # 1W36_D # 78 # 2285 # 1 # Ipe=TTG;rbs_motif=AGxAGG/AGGxGG;rbr=5-10bp;gc_cont=0.49

following an example stored on Barnacle in: /projects/microprot/benchmarking/snakemake_minimal_test/results2/01-search_pdb/NZ_G_6668_O9.1_2/NZ_G_6668_O9.1_2.out and /projects/microprot/benchmarking/snakemake_minimal_test/results2/02-split_pdb/NZ_G_6668_O9.1_2/NZ_G_6668_O9.1_2.match

sjanssen2 commented 7 years ago

How is the "match_identifier" value determined? Is is somewhere stored in the input file? I cannot see it in the existing header for the example, but I might be blind.

sjanssen2 commented 7 years ago

what about the non-match files? I would prefer to have the same header formatting there! Scrap that! Since it is not a hit, we also don't have a hit/match ID.

tkosciol commented 7 years ago

solved by PR #66