DocNow / twarc-network

Generate network visualizations from Twitter data.
MIT License
19 stars 1 forks source link

Limit to edge type (reply, quote, retweet) #2

Closed edsu closed 2 years ago

edsu commented 3 years ago

At the moment whether the edges displayed when visualizing users and tweets are of type reply, quote and retweet. When the network is big it could be useful to only see the replies or quotes. Ideally I think users would be able to control which relations are included. Of course someone in Gephi or Cytoscape or some other network visualization tool could do this themselves. But it could be useful to limit the size of the network in memory for large graphs and could also be useful for the HTML/D3 representation.

numeroteca commented 3 years ago

That is precisely what I was looking for. I was going to ask how the network was built, now it's clear that is currently built with "reply, quote and retweet" relationships.

Related to the type of connection, when I was opening the generated .gefx file in Gephi I see warnings of this type "Type 'retweet' of the edge 'xxx -> yyyy (id = nnn)' is not recognized. Set to default value." classified as a severe issue. I also see these warnings for 'reply' type.

This is what I see in Gephi while I import the network: Screenshot from 2021-07-04 11-34-17

I don't know if this is an important thing to consider or nothing to worry about. Once the file is opened in Gephi there is no option to see the type of relationship. I guess that by using "Sum" as "edge merge strategy" I get in the total number of "reply, quote and retweet" interactions among two users. Once it is clarified I could help completing the documentation or create a specific section for Gephi usage.

igorbrigadir commented 3 years ago

Yes, the Gexf file format has different versions and different expectations too, but it looks like networkx can only write older versions? Either way, the xml formatting is done by networkx

edsu commented 3 years ago

It isn't clear to me if GEXF 1.3 is out yet, and if networkx supports it. But I did notice the same error when importing GEXF, which went away when importing GML instead:

twarc2 network --format gml tweets.jsonl tweets.gml

I noticed that the resulting edge data pane includes both Type and a type columns. Maybe that's what is causing the GEXF importer difficulty because it wants to use type to indicate whether the edge is directed or not?

In v0.0.7 I've changed the property name to tweet_type which seems to make those errors with GEXF go away. I do notice that there probably needs to be some thought put into what tweet_type should be for nodes when using the different node types: tweets, hashtags and users. But that feels like a different issue?

edsu commented 3 years ago

I meant to say @numeroteca that any improvements to the documentation would be welcome!

JoanMassachs commented 2 years ago

Done in #6