DaylightingSociety / SocMap

Social Mapping Framework for Twitter
https://socmap.daylightingsociety.org/
BSD 3-Clause "New" or "Revised" License
18 stars 4 forks source link

Add an option for getting most common edges instead of most recent #27

Open milo-trujillo opened 5 years ago

milo-trujillo commented 5 years ago

We currently have an option -M, --maxreferences that restricts the maximum number of edges leaving a node, to avoid the celebrity problem. However, we currently accomplish this by reading through their tweets reverse-chronologically until we've found enough mentions and retweets. This means -M 30 will get the 30 most recent retweets or mentions for each user. This is not always desirable; what if we want the strongest links between users instead of the most recent?

We should provide an option like -C, --common that changes behavior to read all mentions and retweets per user, sort by occurrence, and use the top X most occurring connections rather than most recent activity.

milo-trujillo commented 4 years ago

Typo in title. This should be a pretty simple change, need to add a dictionary (or Counter from collections) for all the retweet usernames, then just get the top X from the dict. Requires adding an extra field to the options object, maybe passing an extra argument through the acquire code to the retweet collector.