Add Twitter predictions

susiejojo commented 3 years ago

Should solve #10 .

Changes made:

Made necessary modifications to twitterscraper.py.
Changed cleaning functions
Ran batch prediction on 50 latest tweets.
Created a new form field in index.html for entering twitter handle.
Created a new function predict_tweet for batch prediction on 50 tweets.
Return the mode of the index arrays predicted on 50 tweets.
Print result to terminal.

How to test:

Follow a similar approach to when you were entering normal text, instead just enter the twitter handle. Make sure your .env file is set up with the correct credentials. Give the server some time at the start to boot up and load the model. The response json should be printing in the following form: {"type":<personality_type>}.

Challenges:

I believe the kaggle dataset has a large bias towards Introvert personality types. Hence a lot of my tries gave me introvert personality types. We can try playing around with the no. of tweets possibly, but I need to know the API request limit for the Twitter API we're using.

susiejojo commented 3 years ago

is your .env file configured properly? also make sure you wait at the beginning till the server starts with the link to the port 5000 and the virtualenv is activated. The above example works fine for me, I think you are running into issues with the Twitter API response, can you check the get_user_tweets function in twitterscraper.py? For me this issue comes up only when the handle is not valid.

sh-biswas commented 3 years ago

Sorry, this is the first time I'm working with .env and virtualenv. As far I can see, the get_user_tweets function was not changed from the original version I pushed, so that should be fine. Did you change anything in your local .env file that I don't have because all I have in mine are the access keys and tokens for the Twitter API. Also, how do I activate virtualenv?

susiejojo commented 3 years ago

so if u do a pip install -r requirements.txt globally, it won't cause any issues, and there will be no need to activate the virtualenv. If you do want to use virtualenv, install the virtualenv package, create a virtual env using virtualenv <name> and activate using source <name>/bin/activate. The .env file I took from your Drive link, so it should mostly be the same.

sh-biswas commented 3 years ago

Okay so, what happened was that my .env was not even there, which I'm not sure why. Do I need to make a new one every time?

Now the issue is, the to_csv() function won't work because of tweet_path. The function is supposed to create a new csv file right? Because I'm getting a FileNotFound error for the csv file it is supposed to make, even after I created the subdirectory twitter_data.

susiejojo commented 3 years ago

I think you lost your .env file after pulling my branch which didn't have the file coz of gitignore. For the new csv file, do u have modify access to that folder? (speaking from a Linux pov). Maybe you can try editing the permissions to that folder?

sh-biswas commented 3 years ago

Yes, we're good now! Predictions are working. As for the rate limit, I'm not sure if this question was already answered, but the wait_on_rate_limit parameter in the tweepy.API constructor should get rid of the rate limit warnings: if you hit it, it should wait for the rate limits to replenish and pick up where it left off, as far as I can see from the documentation. Also, thank you for helping me so much!

V2dha commented 3 years ago

Working for me too!

MLH-Fellowship / Social-BERTerfly