Open keighrim opened 2 hours ago
copying a message from @owencking over slack today, with proposals for new binning schemes.
I have continued thinking about how to bin labels to get meaningful cross-entropy scores during training and hyperparameter tuning. I came up with a few different binnings that might be meaningful for us. Please have a look at this file, and see what you think. However, I know it is important to have a single one to optimize against. I think "Overall-strict" and "Overall-simple" would be the best choices. If I had to choose one, I think I would choose "Overall-simple" because it will effectively ignore a lot of noise that I believe exists for the "M" and "O" labels. (This recommendation supersedes the proposed binning I suggested during our last Monday meeting.)
{ "Overall-strict": { "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron-person": ["I","N"], "Credits": ["C","R"], "Main": ["M"], "Opening": ["O","W"], "Chyron-other": ["Y","U","K"], "Other-text": ["L","G","F","E","T"], "Neg": ["P",""] }, "Overall-simple": { "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron-person": ["I","N"], "Credits": ["C","R"], "Other-text": ["M","O","W","Y","U","K","L","G","F","E","T"], "Neg": ["P",""] }, "Overall-relaxed":{ "Bars": ["B"], "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Chyron": ["I","N","Y","U","K"], "Credits": ["C","R"], "Other-text": ["M","O","W","L","G","F","E","T"], "Neg": ["P",""] }, "Bars": { "Bars": ["B"], "Other": ["S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Slate": { "Slate": ["S","S:H","S:C","S:D","S:B","S:G"], "Other": ["B","I","N","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Chyron-strict": { "Chyron-person": ["I","N"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","Y","U","K","C","R","M","O","W","L","G","F","E","T","P",""] }, "Chyron-relaxed":{ "Chyron": ["I","N","Y","U","K"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","C","R","M","O","W","L","G","F","E","T","P",""] }, "Credits": { "Credits": ["C","R"], "Other": ["B","S","S:H","S:C","S:D","S:B","S:G","I","N","Y","U","K","M","O","W","L","G","F","E","T","P",""] } }
Because
As mentioned in https://github.com/clamsproject/app-swt-detection/issues/116#issuecomment-2400092529, we want to re-evaluate the effectiveness of pre-binning.
Prebinning was originally implemented in #19 and experimented with various binary and multi-class binning configurations (proposed by @haydenmccormick ) during round 2 experiments, leading up to the first release of the app+model with "3-way" prebinning .
https://github.com/clamsproject/app-swt-detection/blob/v1.0/modeling/config/default.yml#L33-L42
(detailed results from the R2 experiments are recorded in this (privately) shared spreadsheets,
R2-multiclass
,R2-binary
tabs)The binning later replaced with an almost identical "4-way" post-binning scheme based on evidence from round 4 experiments (#63)
Post-binning later completely removed from model configuration as the stitcher code was isolated as an independent module and postbinning turned into a part of the stitcher (#106)
As
in a recent conversation, we discussed re-assessment of the prebinning schemes. This issue is to discuss the implementation and execution, and also track results from the new round of experiments.
Done when
PBD
validation set as the ground truth setAdditional context
No response