santarabantoosoo commented 2 years ago

https://omdena.com/blog/infrastructural-needs/

checking the code and exploring explorers :

@EliGambicchia @etendra2501

now-youre-gittin-it commented 2 years ago

Hi, so I've tried to apply a majority of the content from the above link to Batch G Long-covid filtered data (1876 tweets). Also, they intended to use pyLDAvis but dont seem to have put up the code for that, so I did that from my end on Jupyter/Anaconda environment. Code link

Gensim LDA done.
PyLDAvis provides great interactive visualization and maybe even insights, but because the words are in Italian I'm not able to check the relevance/suitability of results.
To see the visualization, please copy the code in this file to a text editor like notepad, save the file as .html, and open this file on a browser like Chrome.

Pending parts based on above Omdena previous challenge link:
LDA number of topics optimization
LDAMallet, coherence score
Wordcloud part

lucapug commented 2 years ago

Just added the code for word cloud generation.

lucapug commented 2 years ago

correction in cell #16 (to generalize if the batch has < 4000 tweets):

tokens = list(tweetBatch_to_words(clean_tweet_list)) #ERROR IOPub data rate exceeded.

due to the above error, split of clean_tweet_list to yield tokens

tokens = [] if n > 1: for i in range(n-1): tokens.append(list(tweetBatch_to_words(clean_tweet_list[i4000:(i+1)4000])))

tokens.append(list(tweetBatch_to_words(clean_tweet_list[(n-1)*4000:]))) else: tokens.append(list(tweetBatch_to_words(clean_tweet_list)))

OmdenaAI / trieste-italy-long-covid

EDA #6

To see the visualization, please copy the code in this file to a text editor like notepad, save the file as .html, and open this file on a browser like Chrome.

tokens = list(tweetBatch_to_words(clean_tweet_list)) #ERROR IOPub data rate exceeded.

due to the above error, split of clean_tweet_list to yield tokens