Philipp-Sc / llm-fraud-detection

Robust semi-supervised spam detection using Rust native NLP pipelines.
Apache License 2.0
2 stars 2 forks source link

Re-generate topics and re-train fraud detection #1

Closed Philipp-Sc closed 1 year ago

Philipp-Sc commented 1 year ago
governance_proposal_spam_ham.csv 
---------------
count spam: 172
count ham: 2551

Note: This will be great to reduce false positives, since the model has not yet seen many ham (and spam) data for governance proposals.

Note: consider reducing the ham dataset by filtering some of the rejected proposals with high votes against. To make sure not to train likely spam as ham.

Philipp-Sc commented 1 year ago
Philipp-Sc commented 1 year ago
Philipp-Sc commented 1 year ago

Instead of predicting all topics at once (the sum of the predictions equal to 1) predict (binary) topic pairs e.g ["hot","cold"]

New technique performs better. A potential drawback is that a higher number of topics might increase the inference time and makes it take to long on CPU only systems.

Philipp-Sc commented 1 year ago