Closed keighrim closed 4 weeks ago
copying a message from @owencking over slack today, with proposals for new binning schemes.
I have continued thinking about how to bin labels to get meaningful cross-entropy scores during training and hyperparameter tuning. I came up with a few different binnings that might be meaningful for us. Please have a look at this file, and see what you think. However, I know it is important to have a single one to optimize against. I think "Overall-strict" and "Overall-simple" would be the best choices. If I had to choose one, I think I would choose "Overall-simple" because it will effectively ignore a lot of noise that I believe exists for the "M" and "O" labels. (This recommendation supersedes the proposed binning I suggested during our last Monday meeting.)
{ "Overall-strict": { "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron-person": ["I","N"], "Credits": ["C","R"], "Main": ["M"], "Opening": ["O","W"], "Chyron-other": ["Y","U","K"], "Other-text": ["L","G","F","E","T"], "Neg": ["P",""] }, "Overall-simple": { "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron-person": ["I","N"], "Credits": ["C","R"], "Other-text": ["M","O","W","Y","U","K","L","G","F","E","T"], "Neg": ["P",""] }, "Overall-relaxed":{ "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron": ["I","N","Y","U","K"], "Credits": ["C","R"], "Other-text": ["M","O","W","L","G","F","E","T"], "Neg": ["P",""] }, "Bars": { "Bars": ["B"], "Other": ["S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Slate": { "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Other": ["B","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Chyron-strict": { "Chyron-person": ["I","N"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Chyron-relaxed":{ "Chyron": ["I","N","Y","U","K"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","C","R","M","O","W","L","G","F","E","T","P",""] }, "Credits": { "Credits": ["C","R"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","M","O","W","L","G","F","E","T","P",""] } }
Reporting results from a recent experiment with different binning schemes. Here is the list of binning schemes
Besides of pre-binning, the experiment is done with only two other hyperparams;
image_enc_name
: the name of backbone model, convnext tiny and large were used. block_guids_train
: training data size - 1@
means the model is trained all available data, 61@
means the challenging images were blocked from being used as training data. And here's the bar charts from the results;
per label: ba
, sl
, ch
, cr
respectively refer to bars, slates, chyron, credits. For scheme that has two chyron categories, the one with -person
suffix is used. nobinning
scheme is not included here because it requires additional many-to-one aggregation.
https://drive.google.com/file/d/1iTKV573UNrJr8E_s0BAWj7IQ14lmm_Cj/view?usp=drive_link
Overall average P/R/F scores: https://drive.google.com/file/d/1ybhaDqytlooJ9Y8AV9e0figIg9RLSptk/view?usp=drive_link
Closing this issue, as we decided not to use any "pre" binning since we don't want to lose any labels that can be potentially useful for future applications. Instead, our experimental focus will be on the "post" bin where we can experiment with not just schemes, but also algorithms (e.g., max
, sum
, or "learnable" binning). Since post-binning is not part of the CV modeling but rather a post processing on the model predictions, further discussion should be done under the context of #117.
Because
As mentioned in https://github.com/clamsproject/app-swt-detection/issues/116#issuecomment-2400092529, we want to re-evaluate the effectiveness of pre-binning.
Prebinning was originally implemented in #19 and experimented with various binary and multi-class binning configurations (proposed by @haydenmccormick ) during round 2 experiments, leading up to the first release of the app+model with "3-way" prebinning .
https://github.com/clamsproject/app-swt-detection/blob/v1.0/modeling/config/default.yml#L33-L42
(detailed results from the R2 experiments are recorded in this (privately) shared spreadsheets,
R2-multiclass
,R2-binary
tabs)The binning later replaced with an almost identical "4-way" post-binning scheme based on evidence from round 4 experiments (#63)
Post-binning later completely removed from model configuration as the stitcher code was isolated as an independent module and postbinning turned into a part of the stitcher (#106)
As
in a recent conversation, we discussed re-assessment of the prebinning schemes. This issue is to discuss the implementation and execution, and also track results from the new round of experiments.
Done when
PBD
validation set as the ground truth setAdditional context
No response