Closed milo-trujillo closed 5 years ago
Partially implemented. We now have a tool for pruning based on in_degree, so we can remove users that have barely been mentioned or retweeted.
Still to do: Pruning based on number of tweets, or the weight of edges.
Blocked on #10 and #12, clearly
Task complete. Scripts added for pruning based on number of tweets or removing edges by minimum number of retweets or mentions.
Data sets get very large, very quickly. We can produce GML files containing millions of users, which are impossible to load in graph visualization tools like Gephi and Cytoscape.
Let's add tools that allow pruning users with less than an arbitrary number of tweets, or less than an arbitrary number of in or out edges.
This should help researchers remove the less active accounts from their data set, and reduce the graph to a more manageable size for their analysis tools.