I'm trying training sentence embedding, but I only have an imbalance data, say, there are 3500 sentence pairs with a similar score 0.0, 2000 pairs with a score 0.5, while only 250 pairs with a score 1.0.
I know it's hard to train a good classifier on imbalance data, but I wonder if the imbalance data would also hurt the performance when I use the STS training style. Could you offer some suggestions?
Hi,
I'm trying training sentence embedding, but I only have an imbalance data, say, there are 3500 sentence pairs with a similar score 0.0, 2000 pairs with a score 0.5, while only 250 pairs with a score 1.0.
I know it's hard to train a good classifier on imbalance data, but I wonder if the imbalance data would also hurt the performance when I use the STS training style. Could you offer some suggestions?
Thank you!