Closed keighrim closed 4 months ago
With the adjustments made in #114, I created heatmaps per label to analyze the results.
The gridsearch configuration is as follows:
num_splits = {2}
num_epochs = {10}
num_layers = {4}
pos_unit = {60000}
dropouts = {0.1}
img_enc_name = {'convnext_lg', 'convnext_tiny'}
pos_length = {6000000}
pos_abs_th_front = {0, 3, 5, 10}
pos_abs_th_end = {0, 3, 5, 10}
pos_vec_coeff = {0, 1, 0.75, 0.5, 0.25}
Note: For the results, I focused on those using convnext_lg
for img_enc
since the scores seemed to be higher than with convnext_tiny
.
As @owencking suggested, this time I analyzed recall scores and put more focus on chyrons. The heatmaps are included below for reference. The average scores in the bottom row of each table do not include the scores when pos_vec_coeff = 0
, so that the analysis is done only when pos_enc
is enabled; however, pos_vec_coeff = 0
scores are included for comparison when selecting ideal hyperparameter configurations.
Since the focus is on chyrons this time, I started by observing the scores in this heatmap first. Right off the bat, it's clear that the scores tended to be higher when pos_vec_coeff = 0
. I noticed that pos_enc_coeff = 0.5
had fairly high scores when pos_abs_th_end
is 3 or 10, with the latter resulting in higher scores than the former. With that, I wanted to look at the other labels when pos_enc_coeff = 0.5
and pos_abs_th_end = 10
, focusing on when pos_abs_th_front
is 3, 5, or 10.
When pos_enc_coeff = 0.5
and pos_abs_th_end = 10
, the resulting score is the highest when pos_abs_th_front
is 5.
When pos_enc_coeff = 0.5
and pos_abs_th_end = 10
, the resulting score is the highest when pos_abs_th_front
is 3.
When pos_enc_coeff = 0.5
and pos_abs_th_end = 10
, the resulting score is the highest when pos_abs_th_front
is 5.
With these findings, I believe an ideal configuration for the three hyperparameters is as follows:
pos_abs_th_front: 5
pos_abs_th_end: 10
pos_enc_coeff: 0.5
Interestingly, this is nearly identical to the ideal configuration found in https://github.com/clamsproject/app-swt-detection/issues/100#issuecomment-2207028702, except pos_abs_th_front = 3
.
pos_enc
performancesWith the hyperparameter values mentioned above, I performed gridsearch again with the following configuration:
num_splits = {2}
num_epochs = {10}
num_layers = {4}
pos_unit = {60000}
pos_enc_dim = {256}
dropouts = {0.1}
img_enc_name = {'convnext_lg'}
pos_length = {6000000}
pos_abs_th_front = {5}
pos_abs_th_end = {10}
pos_vec_coeff = {0, 0.5}
I separately performed gridsearch with the above configuration but changed pos_abs_th_front
to 3, so that I can observe any differences between the two values since that was the ideal configuration when analyzing F1 scores. Results from both configurations are included below for comparison:
pos_abs_th_front = 3 |
pos_abs_th_front = 5 |
---|---|
For label I, recall is now higher when pos_enc
is enabled. The F1 score is still lower when pos_enc
is enabled compared to when it is not enabled, and the difference of F1 scores between the pos_vec_coeff
values is actually is now larger when pos_abs_th_front = 5
. Since recall is the main focus this time, the results seem more promising than before.
For label B, there is not much difference between the original configuration compared to the new one, but it is noteworthy that both recall and F1 scores are a tad bit higher with the new configuration.
For label C and label S, it seems that both the recall and F1 scores improved slightly compared to the original configuration.
pos_abs_th_front
would be better off set as 5 rather than 3.pos_enc
is disabled also see an improvement across all labels between the two pos_abs_th_front
values.
Bug Description
When I run training round with different
pos_enc_coeff
values, the results don't seem to be vary as shown in https://github.com/clamsproject/app-swt-detection/issues/100#issuecomment-2207028702We need further investigation to make sure the effectiveness of positional encoding
Reproduction steps
some results from the latest round under 7be4b818a0c72713e501b27be9ebaeee5a3e1320
Expected behavior
No response
Log output
No response
Screenshots
No response
Additional context
No response