Closed edsu closed 2 years ago
That is precisely what I was looking for. I was going to ask how the network was built, now it's clear that is currently built with "reply, quote and retweet" relationships.
Related to the type of connection, when I was opening the generated .gefx file in Gephi I see warnings of this type "Type 'retweet' of the edge 'xxx -> yyyy (id = nnn)' is not recognized. Set to default value." classified as a severe issue. I also see these warnings for 'reply' type.
This is what I see in Gephi while I import the network:
I don't know if this is an important thing to consider or nothing to worry about. Once the file is opened in Gephi there is no option to see the type of relationship. I guess that by using "Sum" as "edge merge strategy" I get in the total number of "reply, quote and retweet" interactions among two users. Once it is clarified I could help completing the documentation or create a specific section for Gephi usage.
Yes, the Gexf file format has different versions and different expectations too, but it looks like networkx can only write older versions? Either way, the xml formatting is done by networkx
It isn't clear to me if GEXF 1.3 is out yet, and if networkx supports it. But I did notice the same error when importing GEXF, which went away when importing GML instead:
twarc2 network --format gml tweets.jsonl tweets.gml
I noticed that the resulting edge data pane includes both Type
and a type
columns. Maybe that's what is causing the GEXF importer difficulty because it wants to use type to indicate whether the edge is directed or not?
In v0.0.7 I've changed the property name to tweet_type
which seems to make those errors with GEXF go away. I do notice that there probably needs to be some thought put into what tweet_type
should be for nodes when using the different node types: tweets, hashtags and users. But that feels like a different issue?
I meant to say @numeroteca that any improvements to the documentation would be welcome!
Done in #6
At the moment whether the edges displayed when visualizing users and tweets are of type reply, quote and retweet. When the network is big it could be useful to only see the replies or quotes. Ideally I think users would be able to control which relations are included. Of course someone in Gephi or Cytoscape or some other network visualization tool could do this themselves. But it could be useful to limit the size of the network in memory for large graphs and could also be useful for the HTML/D3 representation.