DaylightingSociety / SocMap

Social Mapping Framework for Twitter
https://socmap.daylightingsociety.org/
BSD 3-Clause "New" or "Revised" License
18 stars 4 forks source link

Add tools for pruning based on tweet and link counts #15

Closed milo-trujillo closed 5 years ago

milo-trujillo commented 6 years ago

Data sets get very large, very quickly. We can produce GML files containing millions of users, which are impossible to load in graph visualization tools like Gephi and Cytoscape.

Let's add tools that allow pruning users with less than an arbitrary number of tweets, or less than an arbitrary number of in or out edges.

This should help researchers remove the less active accounts from their data set, and reduce the graph to a more manageable size for their analysis tools.

milo-trujillo commented 6 years ago

Partially implemented. We now have a tool for pruning based on in_degree, so we can remove users that have barely been mentioned or retweeted.

Still to do: Pruning based on number of tweets, or the weight of edges.

milo-trujillo commented 5 years ago

Blocked on #10 and #12, clearly

milo-trujillo commented 5 years ago

Task complete. Scripts added for pruning based on number of tweets or removing edges by minimum number of retweets or mentions.