QUT-Digital-Observatory / coordination-network-toolkit

A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
MIT License
72 stars 14 forks source link

Workflow to add node attributes to graphml file #37

Open timothyjgraham opened 2 years ago

timothyjgraham commented 2 years ago

Develop a workflow for common tasks that involve adding new node attributes to coordination networks, e.g., adding community clusters or compliance check status codes (e.g., checking for suspended/deleted accounts).

SamHames commented 2 years ago

I was thinking about this further - I think there's a few elements to this:

  1. The low level representation of node attributes in the database, for inclusion in output graphml files. Ie, how do we store this information and place it in the graphml file.
  2. The interface for inserting this information into the database, which could be at two layers:
    1. One is the library level interface which might be something like compute_networks.process.add_node_metadata(db_path, node_id, node_attributes).
    2. The other layer might be a CLI component to bulk add metadata, for example from a CSV: compute_networks database.db add_node_metadata --format=csv metadata.csv
  3. Higher level convenience functions - if we have 1 and 2.1 above, we could in principle provide functionality in the toolkit that wraps around twarc as a library and provides a standardised approach to representing user metadata for the twitter platform. For example compute_networks database.db augment twitter_user_profile.