Closed hughplay closed 5 months ago
Hi, it looks like you are using the same methods that I used for evaluation. The original notebooks can be found on the zenodo linked in the paper : https://doi.org/10.5281/zenodo.7731163
Thank you very much! I have reproduced the results.
The reason that I got wrong scores is that I first represented the alignment in another format, and my transforming function was not well tested and I obatained wrong alignment states for computing scores. 😭
I'm back again.
I find the alignment score seems to be weired in some cases. According to my observation, it happens when the alignments starting with "21:", for example (MALIDUP, d1knca):
manual
SSITRSSVLDQEQLWGTLLASAAATRNPQVLADIGAEATDH-LSAAARHAALGAAAIMGMNNVFYRGRGFLE
:::::::::::::::::::::::::::::::::::::::::1::::::::::::::::::::::::::::::
MNIIANPGIPKANFELWSFAVSAINGCSHCLVAHEHTLRTVGVDREAIFEALKAAAIVSGVAQALATIEALS
deepblast
S-SITRSSVLDQEQLWGTLLASAAATRNPQVLADIGAEATDH-LSAAARHAALGAAAIM-GMNNVFYRGRGFLE
21::::::::::::::::::::::::::::::::::::::::1::::::::::::::::1:::::::::::2::
-MNIIANPGIPKANFELWSFAVSAINGCSHCLVAHEHTLRTVGVDREAIFEALKAAAIVSGVAQALATIEA-LS
true_edges:
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18), (19, 19), (20, 20), (21, 21), (22, 22), (23, 23), (24, 24), (25, 25), (26, 26), (27, 27), (28, 28), (29, 29), (30, 30), (31, 31), (32, 32), (33, 33), (34, 34), (35, 35), (36, 36), (37, 37), (38, 38), (39, 39), (40, 40), (41, 40), (42, 41), (43, 42), (44, 43), (45, 44), (46, 45), (47, 46), (48, 47), (49, 48), (50, 49), (51, 50), (52, 51), (53, 52), (54, 53), (55, 54), (56, 55), (57, 56), (58, 57), (59, 58), (60, 59), (61, 60), (62, 61), (63, 62), (64, 63), (65, 64), (66, 65), (67, 66), (68, 67), (69, 68), (70, 69), (71, 70)]
pred_edges:
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 6), (8, 7), (9, 8), (10, 9), (11, 10), (12, 11), (13, 12), (14, 13), (15, 14), (16, 15), (17, 16), (18, 17), (19, 18), (20, 19), (21, 20), (22, 21), (23, 22), (24, 23), (25, 24), (26, 25), (27, 26), (28, 27), (29, 28), (30, 29), (31, 30), (32, 31), (33, 32), (34, 33), (35, 34), (36, 35), (37, 36), (38, 37), (39, 38), (40, 39), (41, 40), (42, 40), (43, 41), (44, 42), (45, 43), (46, 44), (47, 45), (48, 46), (49, 47), (50, 48), (51, 49), (52, 50), (53, 51), (54, 52), (55, 53), (56, 54), (57, 55), (58, 56), (59, 56), (60, 57), (61, 58), (62, 59), (63, 60), (64, 61), (65, 62), (66, 63), (67, 64), (68, 65), (69, 66), (70, 67), (70, 68), (71, 69), (72, 70)]
DeepBlast predicts pretty well in this case, but the f1 score is 0. I am confused about the evaluation method. What are the edges? Why we need to compute the edges first? And why the f1 score is 0 in this case?
Hi, the edges are the match coordinates between the two sequences.
Regarding f1 score, if there is an off-by-1 error, the f1 score can be zero, even if the structural similarity is preserved. This is why f1 score isn't a great metric (TM-score is more robust).
Regarding the edge alignments, indeed there are weird edge cases. This is partially due to the querks surrounding indels -- the current gap-position-specific scoring setup isn't ideal. And we don't have a concept of affine gap scoring (it turns out to be highly non-trivial to setup for differential dynamic programming). See the DEDAL paper on a discussion on this
Despite these setbacks, these edge cases doesn't seem to strong affect the TMscore, since superposition is still roughly the same.
Hi,
Your excellent work on the sequence alignment is awesome and inspiring.
Recently, I tested deepblast on the MALIDUP and MALISAM and find the results indeed is great. However, I am confused how the F1 score in the Table 2 is computed. I have tried to reproduce the score with my own evaluation pipeline, as well as computing the f1 based on the tp, fp, fn returned by the function
alignment_score
, but both results are far from the value given in the table. I think there must be some mistakes in my evaluation code.The code for evaluting one sample based on
alignment_score
is like this:Could you please provide guidance on the correct method for calculating the F1 score?
Thank you!