This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
MIT License
211
stars
15
forks
source link
Inconsistent Result while using a fix random seed #20
I have been using Spacy - Classy Classification to classify text messages. Python version 3.10
Below is the the training model and I get the Unknown category with the highest score for this specific message:
Result (which is correct as the Unknown as the highest score): {'Commercial': 0.13948287736862833, 'Crypto': 0.015437351941468657, 'Extortion': 0.0860014895963152, 'Financial': 0.01987490991768424, 'Gambling': 0.029074990906618126, 'Gift': 0.06850244399154756, 'Investment': 0.012729882351053419, 'Invoice': 0.0718818617408037, 'Phishing': 0.046637490542787444, 'Romance': 0.05515818363916855, 'Unknown': 0.45521851800392493}
Importing test dataset:
The result in csv file for that same statement:
Body | NLP_Result | Commercial | Crypto | Extortion | Financial | Gambling | Gift | Investment | Invoice | Phishing | Romance | Unknown | Category -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- FW: ðšð™´: ðšˆðš˜ðšž ðš‘ðšŠðšŸðšŽ ðš˜ðš—𚎠(ðŸ·) ðš˜ðš›ðšðšŽðš› ðš™ðšŽðš—ðšðš’ðš—ðš ðšðšŽðš•ðš’ðšŸðšŽðš›ðš¢. #622460835 | {'Commercial': 0.03343028275903707, 'Crypto': 0.012076486026176284, 'Extortion': 0.08983918751534335, 'Financial': 0.07360790896376578, 'Gambling': 0.014564933067751274, 'Gift': 0.08460245841797985, 'Investment': 0.017324353297565327, 'Invoice': 0.1522007262418396, 'Phishing': 0.4507937431127887, 'Romance': 0.010566873139864728, 'Unknown': 0.060993047457888194} | 0.03343 | 0.012076 | 0.089839 | 0.073608 | 0.014565 | 0.084602 | 0.017324 | 0.152201 | 0.450794 | 0.010567 | 0.060993 | Phishing