Open colditzjb opened 4 years ago
@colditzjb Could you expand upon this in our meeting tomorrow? I'd like to start developing this for my final project in Justin Zhan's Data Mining class I am taking this semester.
@wbaker23 Sure thing! The most recent script that I had developed for this is: /home/jcolditz/twitter/RITHM/parser/network.py
It's not ready for deployment, but it is a good starting point.
Several scripts that format tweet data for social network analysis have been developed for project-specific use cases but these are not broadly generalizable or consistent with newer RITHM conventions for handling input and output procedures. It will be beneficial to develop a script for consistent implementation of basic network analysis within the RITHM framework.
This will start with a procedure that (1) links retweet IDs to original tweet IDs that are present in existing metadata. Continued work will include (2) linking tweet response IDs to original tweet IDs and (3) linking quoted tweet IDs to original tweet IDs. This process should be flexible enough that these various relationships (1-3) can be handled separately and so that output can be aggregated for user-level analysis. Output should include well-formed dictionary objects (pickle format) that can be updated as new data are added, as well as node and edge files (TSV format) that can be used in third-party analysis software.