Closed havardl closed 2 years ago
That's a good question - there wasn't documentation, but I have added some barebones elements in https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/blob/main/docs/network_types.md
The node attributes are intended to provide a sketch of the account/profile - they aren't tied to or influenced by the edges calculated in the network, they're just the latest messages from that user in the dataset. And you're right that they're capped (10 by default, but can be changed with --n_messages), so we don't try to stuff the entire dataset into the node attributes.
@timothyjgraham - this prompts me to think that we could also add message edge attributes, which are sampled from the actual coordinated activity between those accounts.
Thanks for the question, @havardl. We will be publishing a paper soon with this kind of detail, but for now hopefully @SamHames' answer is helpful for you.
@SamHames, I think the message edge attributes is a very good idea. I'll add it as an enhancement issue.
Thank you for the feedback, this was very helpful. Looking forward to the paper!
I was just wondering if there is any documentation of the node and edge attributes which are outputted in the graphml files when computing co_retweets and co_link files?
For the nodes, I'm currently getting ten attributes called
message_0
..message_9
. But when inspecting the edges, I can see that some edges has a weight of more than 10, which I assume means that they have retweeted more than 10 times.So my question is which messages are included in the 10 message attributes for each node? And am I right in assuming that this attribute list is capped at 10 messages? Or should it include all messages retweeted by a profile in the database?