Question Regarding Data Structure for Echo Chamber Analysis

faalatawi / echo-chamber-score

ECS (Echo Chamber Score) is a method to measure the echo chamber and polarization in social media.

MIT License

5 stars 0 forks source link

Question Regarding Data Structure for Echo Chamber Analysis #1

Closed adafnos closed 10 months ago

adafnos commented 1 year ago

Hi,

thanks for making the code public, this is a very interesting way to study echo chambers. I am planning to use your approach for my case as well. But I have a question about the structure of your datasets. I noticed that the values of the _userid and retweets columns are unique. In my case, the same _userid or even tweets appear more than one time. Is this problematic in your opinion?

Thanks, Andreas

faalatawi commented 1 year ago

Thank you for your interest.

Could you give more details about the issue? Which dataset, file, and user_id?

adafnos commented 1 year ago

For example, if you analyse the gun dataset (data/gun/tweets.feather), it has two columns, i.e. user_id and tweets. The shape of the dataframe is (6941, 2). The length of the unique values in the user_id column is also 6941. This is not the case with my data.

I'm asking this question because after trying your code, I always get two communities: one very small one with, say, < 50 nodes and another with > 30000 nodes. I tried different time intervals for the RT networks, but the results are always the same: one small community and another large one. I just wanted to make sure that the structure of my dataset doesn't affect the results.

faalatawi commented 1 year ago

Which code? Are you talking about community detection?

adafnos commented 1 year ago

Hi, everything works fine now. I hadn't realised that I needed to group together all the tweets that belong to the same user, that's why the results did not make sense.