Create labeled training data classifying tweets as positive or negative

TeddyCr / twitter-sentiment

Twitter sentiment is a Python library leveraging NLP and the Twitter API to determine the emotion of a tweet

MIT License

6 stars 6 forks source link

Create labeled training data classifying tweets as positive or negative #9

Open TeddyCr opened 6 years ago

TeddyCr commented 6 years ago

Description

twitter-sentiment is currently using textBlob default ML algorithm. To develop our own 'custom' ML algorithm, we need to develop a training dataset labeling each Tweet as positive or negative.

File

The file should be saved as a .json and it should follow the below schema:

[
    {"tweet": "this is the text form a tweet", "label":"pos"},
    {"tweet": "this is the text form another tweet", "label":"neg"}
]

Once created, it should be saved in twitter-sentiment/twitterSentiment/tweetLabels.json

The initial file can be found here

To be determined/Discussed

The number of tweets that should be presents in the file has not been determined yet. It is open for discussion and any suggestions are more than welcome

Optimus-PrimaNocta commented 6 years ago

Do you have a preference on what type of accounts the tweets come from?

TeddyCr commented 6 years ago

@KPGunner for this first it does not matter - though we should try to gather tweets from different account and limit the number of tweets used from the same account to a low number (I was thinking of no more than 5).

The critical item for the training data here will be the size - so that it is significant enough for the algorithm to be accurate.

Optimus-PrimaNocta commented 6 years ago

Let me know if I did that right. Committed it from PyCharm and honestly had no idea what I was doing. For some reason it would let me create a pull request. Had to create one uploading the filed on my fork. I'll figure it out.

I only had time to do about 50 of them but there will be more.

TeddyCr commented 6 years ago

@KPGunner, thanks for putting these together. I am not sure I saw any files. Could you attach it to this issue? Also, I forgot to mention, but we need to make sure the Tweet are public tweet (to ensure this, it should not include anyone you follow).

Optimus-PrimaNocta commented 6 years ago

This is my first contribution to anything open source, so I'm going to screw up a few times I'm certain. I think I got it figured out this time.

The tweets were from public accounts and usernames were chosen randomly using Tweepy and a bot account I run.