ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
90 stars 82 forks source link

correlation of the two query #512

Closed yangguah closed 6 years ago

yangguah commented 6 years ago

last week we found the way to calculate the correlation according to the grids did not make much sense because it cannot reflect the pattern that both words occur on the same tweet. This week I came up with another way to show the relationship between two words. Suppose word1 is A, and the word2 is B, I want to calculate the possibility that A and B occur on the same tweet based on the tweets contains word A or B.

yangguah commented 6 years ago

The work I did in the cloudberry in summer is to use Jaccard function to calculate the relation of two words in the twittermap according to the condition where we regard each user's tweets as just one whole documentation. Actually, recent work focuses more on the "sense" of this method using Jaccard. I will send different combinations of two keywords together to check whether this method makes sense.

chenlica commented 6 years ago

Related issue https://github.com/ISG-ICS/cloudberry/issues/548.