el-cornetillo / senti-py

A sentiment Analysis classifier in spanish
120 stars 40 forks source link

senti-py

A pre-trained sentiment Analysis classifier in spanish

Author : Elliot

This is a package to perform sentiment analysis in spanish.

It's built on top of scikit-learn and NLTK.

Marisa-trie is used to make the final trained model.pkl memory-efficient ( from 150Ko to 28Ko!)

INSTALLATION

It's as simple as :

1/ Open terminal

2/ Run 'pip install spanish_sentiment_analysis'

USAGE

See the demo_classifier.ipynb notebook to see how to use the classifier.

THE DATA

The model is fed data crawled from various websites :

Trip Advisor

PedidosYa

Apestan

QuejasOnline

MercadoLibre

SensaCine

OpenCine

TASS

Twitter

(See the files under /crawlers if interested) This represents roughly 1M samples.

THE MODEL

The model is a simple pipeline that includes :

The parameters and hyper-parameters of this pipeline are found by the use of a GridSearch K - cross validation with K = 10

THE PREPROCESSING

All the comments are preprocessed before the training is done :

THE PREDICTION

The prediction is calculated with a few rules: