clamsproject / app-swt-detection

CLAMS app for detecting scenes with text from video input
Apache License 2.0
1 stars 0 forks source link

bring back "pre-binning" #118

Closed keighrim closed 4 weeks ago

keighrim commented 1 month ago

Because

As mentioned in https://github.com/clamsproject/app-swt-detection/issues/116#issuecomment-2400092529, we want to re-evaluate the effectiveness of pre-binning.


Prebinning was originally implemented in #19 and experimented with various binary and multi-class binning configurations (proposed by @haydenmccormick ) during round 2 experiments, leading up to the first release of the app+model with "3-way" prebinning .

https://github.com/clamsproject/app-swt-detection/blob/v1.0/modeling/config/default.yml#L33-L42

(detailed results from the R2 experiments are recorded in this (privately) shared spreadsheets, R2-multiclass, R2-binary tabs)

The binning later replaced with an almost identical "4-way" post-binning scheme based on evidence from round 4 experiments (#63)

Post-binning later completely removed from model configuration as the stitcher code was isolated as an independent module and postbinning turned into a part of the stitcher (#106)


As

in a recent conversation, we discussed re-assessment of the prebinning schemes. This issue is to discuss the implementation and execution, and also track results from the new round of experiments.

Done when

Additional context

No response

keighrim commented 1 month ago

copying a message from @owencking over slack today, with proposals for new binning schemes.


I have continued thinking about how to bin labels to get meaningful cross-entropy scores during training and hyperparameter tuning. I came up with a few different binnings that might be meaningful for us. Please have a look at this file, and see what you think. However, I know it is important to have a single one to optimize against. I think "Overall-strict" and "Overall-simple" would be the best choices. If I had to choose one, I think I would choose "Overall-simple" because it will effectively ignore a lot of noise that I believe exists for the "M" and "O" labels. (This recommendation supersedes the proposed binning I suggested during our last Monday meeting.)

{
"Overall-strict": {
    "Bars": ["B"],
    "Slate": ["S","S:H","S:C","S:D","S:B","S:G"],
    "Chyron-person": ["I","N"],
    "Credits": ["C","R"],
    "Main": ["M"],
    "Opening": ["O","W"],
    "Chyron-other": ["Y","U","K"],
    "Other-text": ["L","G","F","E","T"],
    "Neg": ["P",""]
},

"Overall-simple": {
    "Bars": ["B"],
    "Slate": ["S","S:H","S:C","S:D","S:B","S:G"],
    "Chyron-person": ["I","N"],
    "Credits": ["C","R"],
    "Other-text": ["M","O","W","Y","U","K","L","G","F","E","T"],
    "Neg": ["P",""]
},

"Overall-relaxed":{
    "Bars": ["B"],
    "Slate": ["S","S:H","S:C","S:D","S:B","S:G"],
    "Chyron": ["I","N","Y","U","K"],
    "Credits": ["C","R"],
    "Other-text": ["M","O","W","L","G","F","E","T"],
    "Neg": ["P",""]
},

"Bars": {
    "Bars": ["B"],
    "Other": ["S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""]
},

"Slate": {
    "Slate": ["S","S:H","S:C","S:D","S:B","S:G"],
    "Other": ["B","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""]
},

"Chyron-strict": {
    "Chyron-person": ["I","N"],
    "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""]
},

"Chyron-relaxed":{
    "Chyron": ["I","N","Y","U","K"],
    "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","C","R","M","O","W","L","G","F","E","T","P",""]
},

"Credits": {
    "Credits": ["C","R"],
    "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","M","O","W","L","G","F","E","T","P",""]
}
}
keighrim commented 1 month ago

Reporting results from a recent experiment with different binning schemes. Here is the list of binning schemes

https://github.com/clamsproject/app-swt-detection/blob/1d77c5e31039c4e3e83be90469942a5da15eea6f/modeling/gridsearch.py#L137-L186

Besides of pre-binning, the experiment is done with only two other hyperparams;

  1. image_enc_name: the name of backbone model, convnext tiny and large were used.
  2. block_guids_train: training data size - 1@ means the model is trained all available data, 61@ means the challenging images were blocked from being used as training data.

And here's the bar charts from the results;

keighrim commented 4 weeks ago

Closing this issue, as we decided not to use any "pre" binning since we don't want to lose any labels that can be potentially useful for future applications. Instead, our experimental focus will be on the "post" bin where we can experiment with not just schemes, but also algorithms (e.g., max, sum, or "learnable" binning). Since post-binning is not part of the CV modeling but rather a post processing on the model predictions, further discussion should be done under the context of #117.