This improves on the old approach by better handling cases where (as an extreme example) one account might have tweeted once, and the other account might have tweeted hundreds of times. This change also means that all networks are now directed, to handle the assymetry in this measure.
Additionally, this has also cleaned up the computational pipeline some more, and enables the early pruning of nodes that can never have outbound edges meeting the min_edge_weight threshold. On larger datasets, or datasets with large numbers of low activity accounts this can lead to savings of 30-40% of the computational time. There are additional (optional) entrypoints for managing these optimisations in the library as well.
This improves on the old approach by better handling cases where (as an extreme example) one account might have tweeted once, and the other account might have tweeted hundreds of times. This change also means that all networks are now directed, to handle the assymetry in this measure.
Additionally, this has also cleaned up the computational pipeline some more, and enables the early pruning of nodes that can never have outbound edges meeting the min_edge_weight threshold. On larger datasets, or datasets with large numbers of low activity accounts this can lead to savings of 30-40% of the computational time. There are additional (optional) entrypoints for managing these optimisations in the library as well.