Hi There. I was getting no hit results back and I noticed that the percent identities and #miss-matches being reported in "haploids with low identities" tab on the output taxonomy spreadsheet didnt make any sense.
Looking at pipeline.py line 256-261
# /core/pipeline.py
for alignment in blast_record.alignments:
hsp = alignment.hsps[0]
aln_len = alignment.length
identity = hsp.identities/aln_len
if identity >= blast_identity/100:
good_alns.append(alignment)
For me aln_len is reporting the length of the hit record in the database, not the HSP overlap length. This means the identity number is really much smaller than it should be. I fixed it by assigning aln_len to hsp.align_length (see below).
#/core/pipeline.py
for alignment in blast_record.alignments:
hsp = alignment.hsps[0]
aln_len = hsp.align_length #alignment.length
identity = hsp.identities/aln_len
if identity >= blast_identity/100:
good_alns.append(alignment)
Now I get correct reporting on the identity because it is dividing by the HSP length and not the hit record length.
This also needs to be fixed on lines 266 and 289 (moving it below the hsp assignment which occurs on line 269 and 292, respectively)
Hi There. I was getting no hit results back and I noticed that the percent identities and #miss-matches being reported in "haploids with low identities" tab on the output taxonomy spreadsheet didnt make any sense.
Looking at pipeline.py line 256-261
For me
aln_len
is reporting the length of the hit record in the database, not the HSP overlap length. This means the identity number is really much smaller than it should be. I fixed it by assigningaln_len
tohsp.align_length
(see below).Now I get correct reporting on the identity because it is dividing by the HSP length and not the hit record length.
This also needs to be fixed on lines 266 and 289 (moving it below the hsp assignment which occurs on line 269 and 292, respectively)