Open TeddyCr opened 6 years ago
Do you have a preference on what type of accounts the tweets come from?
@KPGunner for this first it does not matter - though we should try to gather tweets from different account and limit the number of tweets used from the same account to a low number (I was thinking of no more than 5).
The critical item for the training data here will be the size - so that it is significant enough for the algorithm to be accurate.
Let me know if I did that right. Committed it from PyCharm and honestly had no idea what I was doing. For some reason it would let me create a pull request. Had to create one uploading the filed on my fork. I'll figure it out.
I only had time to do about 50 of them but there will be more.
@KPGunner, thanks for putting these together. I am not sure I saw any files. Could you attach it to this issue? Also, I forgot to mention, but we need to make sure the Tweet are public tweet (to ensure this, it should not include anyone you follow).
This is my first contribution to anything open source, so I'm going to screw up a few times I'm certain. I think I got it figured out this time.
The tweets were from public accounts and usernames were chosen randomly using Tweepy and a bot account I run.
Description
twitter-sentiment is currently using textBlob default ML algorithm. To develop our own 'custom' ML algorithm, we need to develop a training dataset labeling each Tweet as positive or negative.
File
The file should be saved as a .json and it should follow the below schema:
Once created, it should be saved in
twitter-sentiment/twitterSentiment/tweetLabels.json
The initial file can be found here
To be determined/Discussed
The number of tweets that should be presents in the file has not been determined yet. It is open for discussion and any suggestions are more than welcome