fani-lab / Osprey

Online Predatory Conversation Detection
0 stars 0 forks source link

Adding oversampling #27

Open rezaBarzgar opened 1 year ago

rezaBarzgar commented 1 year ago

The PAN dataset is highly imbalanced, with about 3% predatory and 97% non-predatory messages. As a result, we cannot train the model properly to detect predatory messages. We must use oversampling methods such as SMOTE to produce more positive labels.