el-cornetillo / senti-py

A sentiment Analysis classifier in spanish
121 stars 40 forks source link

cannot find file sentiment_pipeline.pkl #3

Open DonPeregrina opened 5 years ago

DonPeregrina commented 5 years ago

Hello Ayliote , seems very cool project but for whatever reason I cannot seem to find the file named sentiment_pipeline.pkl

Here is the full error No such file or directory: 'C:\Users\L03121890\Documents\Pyton\example\senti-py-master\classifier/model/sentiment_pipeline.pkl

Could you tell me where can I find this file?

Thanks a lot.

el-cornetillo commented 5 years ago

Hi DonPeregrina,

From your traceback error, it seems to me that you cloned the github repo however this is not the proper way of installing the package. The code in this github repo is not the exact package code, for instance it does not include indeed the sentiment_pipeline.pkl object (there is not interest in storing this relatively big file on the Github space).

What I would do to fix the problem in your case : 1/ Be sure you are using Python3 2/ Manually remove what you cloned on your computer (that is, manually remove the senti-py-master folder and everything it contains) 3/ Then the only proper way to install the package :

4/ At that point the package should be linked to your main Python environment and you should be able to use it

Tell me if those steps could fix the issue Elliot

DonPeregrina commented 5 years ago

That is awesome Elliot! Thanks a lot

There is one remaining question, sorry if this is too obvious but I am new to python, I have that module already installed but I am not sure how to use it, which commands or functions call first etc. Is there any documentation about how to use it?

Thabk you so much for your help, this is awesome.

el-cornetillo commented 5 years ago

You're welcome, I'm glad it helped! If you could install it correctly, then the idea of the package is to provide an Object that takes as input a text (as string) and computes a sentiment score between 0 and 1 (values near 0 mean that probably the text has a negative ton, values near 0 mean that probably the text has a positive ton)

this is done through a "SentimentClassifier" class, that has a function predict that you use to compute scores.

That's pretty much all it does! I suggest you to take a look at the notebook (https://github.com/aylliote/senti-py/blob/master/demo_classifier.ipynb) to see some examples of how it works

DonPeregrina commented 5 years ago

This is just awesome, it works! haha thanks a lot. One last question do you know of other modules or development procedures or analysis that I could apply to my natural language texts? Like perhaps subjectivity or anything else?

I really appreciate your help , its really useful , and on a curious note, I noticed this module works better with southamerican phrases, with Mexican connotations it works as well but the model is tuned with south american expressions and phrases haha.

Thanks a lot again.

el-cornetillo commented 5 years ago

There are really a lot of modules and packages designed to solve NLP problems, I can not help you on this one it all depends on which tasks you aim at solving, you should look on the internet.

(indeed, model was willingly trained on reviews with high bias towards southamerican/argentina sentences, that's what you noticed!)

DonPeregrina commented 5 years ago

Hi Elliot , thanks a lot for your input.

Another question regarding your module , I guess you defined the scale for the "Polarity" to be from 0 - 1 , right? Or is the classifer giving results on Subjectivity instead of Polarity?

I was asking because I saw that polarity is often scaled from -1 to 1

Thanks a lot man.

el-cornetillo commented 5 years ago

Hi DonPeregrina, My apologies for this late answer.

Rigorously speaking, the output is the probability that the statement belongs to the class 1 (which in human words correspond to a positive sentiment). That is also known as Polarity indeed. Subjectivity prediction is an other task, that is not tackled by this package.

However, you should not wonder too much about the range, this is just a definition. In my case the output is a probability so it makes sense to squash it into the [0,1] interval. If you want it to be scaled between -1 and 1 you can always apply a tranformation to it, with a linear function (lambda x : 2*x - 1) or a more complex transformation (any increasing function that maps the interval [0,1] to [-1, 1]).

Kind regards,