positional encoding is not working

Gridsearch results

With the adjustments made in #114, I created heatmaps per label to analyze the results.

The gridsearch configuration is as follows:

num_splits = {2}
num_epochs = {10}
num_layers = {4}
pos_unit = {60000}
dropouts = {0.1}
img_enc_name = {'convnext_lg', 'convnext_tiny'}
pos_length = {6000000}
pos_abs_th_front = {0, 3, 5, 10}
pos_abs_th_end = {0, 3, 5, 10}
pos_vec_coeff = {0, 1, 0.75, 0.5, 0.25}

Note: For the results, I focused on those using convnext_lg for img_enc since the scores seemed to be higher than with convnext_tiny.

As @owencking suggested, this time I analyzed recall scores and put more focus on chyrons. The heatmaps are included below for reference. The average scores in the bottom row of each table do not include the scores when pos_vec_coeff = 0, so that the analysis is done only when pos_enc is enabled; however, pos_vec_coeff = 0 scores are included for comparison when selecting ideal hyperparameter configurations.

Label I

Since the focus is on chyrons this time, I started by observing the scores in this heatmap first. Right off the bat, it's clear that the scores tended to be higher when pos_vec_coeff = 0. I noticed that pos_enc_coeff = 0.5 had fairly high scores when pos_abs_th_end is 3 or 10, with the latter resulting in higher scores than the former. With that, I wanted to look at the other labels when pos_enc_coeff = 0.5 and pos_abs_th_end = 10, focusing on when pos_abs_th_front is 3, 5, or 10.

Label B

When pos_enc_coeff = 0.5 and pos_abs_th_end = 10, the resulting score is the highest when pos_abs_th_front is 5.

Label C

When pos_enc_coeff = 0.5 and pos_abs_th_end = 10, the resulting score is the highest when pos_abs_th_front is 3.

Label S

When pos_enc_coeff = 0.5 and pos_abs_th_end = 10, the resulting score is the highest when pos_abs_th_front is 5.

Conclusion

With these findings, I believe an ideal configuration for the three hyperparameters is as follows:

pos_abs_th_front: 5
pos_abs_th_end: 10
pos_enc_coeff: 0.5

Interestingly, this is nearly identical to the ideal configuration found in https://github.com/clamsproject/app-swt-detection/issues/100#issuecomment-2207028702, except pos_abs_th_front = 3.

Comparing `pos_enc` performances

With the hyperparameter values mentioned above, I performed gridsearch again with the following configuration:

num_splits = {2}
num_epochs = {10}
num_layers = {4}
pos_unit = {60000}
pos_enc_dim = {256}
dropouts = {0.1}
img_enc_name = {'convnext_lg'}
pos_length = {6000000}
pos_abs_th_front = {5}
pos_abs_th_end = {10}
pos_vec_coeff = {0, 0.5}

I separately performed gridsearch with the above configuration but changed pos_abs_th_front to 3, so that I can observe any differences between the two values since that was the ideal configuration when analyzing F1 scores. Results from both configurations are included below for comparison:

`pos_abs_th_front` = 3	`pos_abs_th_front` = 5

For label I, recall is now higher when pos_enc is enabled. The F1 score is still lower when pos_enc is enabled compared to when it is not enabled, and the difference of F1 scores between the pos_vec_coeff values is actually is now larger when pos_abs_th_front = 5. Since recall is the main focus this time, the results seem more promising than before.

For label B, there is not much difference between the original configuration compared to the new one, but it is noteworthy that both recall and F1 scores are a tad bit higher with the new configuration.

For label C and label S, it seems that both the recall and F1 scores improved slightly compared to the original configuration.

Concluding thoughts

From these findings, it seems that pos_abs_th_front would be better off set as 5 rather than 3.
It is interesting to note that the scores when pos_enc is disabled also see an improvement across all labels between the two pos_abs_th_front values.

clamsproject / app-swt-detection