Open henrykang7177 opened 1 year ago
Can you provide more details of your exp settings? For example, configuration and exp log. Thanks
On 2022-11-30 15:58, henrykang7177 wrote:
Hi I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200)
I was trying to apply the BERT code in Neural Network Quick Start Tutorial [1] directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case.
I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance?
For your reference:
I also tried the linear method, where I saw using train_cost_sensitive instead of train_1vsrest improved noticeably on this issue. (with train_cost_sensitive, the model predicts 4 times more positive samples than with train_1vsrest. Although both methods have 'Micro-F1 and @.***' close to 0.99 (due to dominating negative samples) and Macro-F1 around 0.5)
Thanks!
-- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you are subscribed to this thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/ASUS-AICS/LibMultiLabel/issues/229", "url": "https://github.com/ASUS-AICS/LibMultiLabel/issues/229", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://www.csie.ntu.edu.tw/~cjlin/libmultilabel/api/nn_tutorial.html#neural-network-quickstart-tutorial [2] https://github.com/ASUS-AICS/LibMultiLabel/issues/229 [3] https://github.com/notifications/unsubscribe-auth/ABI3BHVOXY2O3TOLNKTSS53WK4CLZANCNFSM6AAAAAASPKRENM
Hi!
I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200)
I was trying to apply the BERT code in Neural Network Quick Start Tutorial directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case.
I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance?
For your reference:
I also tried the linear method, where I saw using
train_cost_sensitive
instead oftrain_1vsrest
improved noticeably on this issue. (withtrain_cost_sensitive
, the model predicts 4 times more positive samples than withtrain_1vsrest
. Although both methods have 'Micro-F1 and 'P@1' close to 0.99 (due to dominating negative samples) andMacro-F1
around 0.5)Thanks!