Closed yangguah closed 6 years ago
The work I did in the cloudberry in summer is to use Jaccard function to calculate the relation of two words in the twittermap according to the condition where we regard each user's tweets as just one whole documentation. Actually, recent work focuses more on the "sense" of this method using Jaccard. I will send different combinations of two keywords together to check whether this method makes sense.
Related issue https://github.com/ISG-ICS/cloudberry/issues/548.
last week we found the way to calculate the correlation according to the grids did not make much sense because it cannot reflect the pattern that both words occur on the same tweet. This week I came up with another way to show the relationship between two words. Suppose word1 is A, and the word2 is B, I want to calculate the possibility that A and B occur on the same tweet based on the tweets contains word A or B.