OmdenaAI / trieste-italy-long-covid

GNU General Public License v3.0
9 stars 0 forks source link

EDA #6

Open santarabantoosoo opened 2 years ago

santarabantoosoo commented 2 years ago

https://omdena.com/blog/infrastructural-needs/

checking the code and exploring explorers :

@EliGambicchia @etendra2501

now-youre-gittin-it commented 2 years ago

Hi, so I've tried to apply a majority of the content from the above link to Batch G Long-covid filtered data (1876 tweets). Also, they intended to use pyLDAvis but dont seem to have put up the code for that, so I did that from my end on Jupyter/Anaconda environment. Code link

lucapug commented 2 years ago

Just added the code for word cloud generation.

lucapug commented 2 years ago

correction in cell #16 (to generalize if the batch has < 4000 tweets):

tokens = list(tweetBatch_to_words(clean_tweet_list)) #ERROR IOPub data rate exceeded.

due to the above error, split of clean_tweet_list to yield tokens

tokens = [] if n > 1: for i in range(n-1): tokens.append(list(tweetBatch_to_words(clean_tweet_list[i4000:(i+1)4000])))

tokens.append(list(tweetBatch_to_words(clean_tweet_list[(n-1)*4000:]))) else: tokens.append(list(tweetBatch_to_words(clean_tweet_list)))