Closed nithinmanoj10 closed 1 year ago
This list can be expanded to further improve our knowledge about dynamic graphs.
Create a separate comment for each topic and we can add on to that
After going through the datasets provided by PyTorch Geometric Temporal, the following dynamic datasets were found
The above mentioned datasets are for Dynamic Graphs with Temporal Signals, i.e, structure and node/edge features of the graph changes over time
In the Introduction section of Packed CSR, it is mentioned that real-life networks are highly dynamic in nature with many additions/deletions/updations being done every second
Social networks such as Facebook and Twitter are highly dynamic graphs since new users and connections are added constantly. Twitter averages about 500 million tweets a day [Say] and Facebook has about 41,000 posts (2.5Mb of data) per second [WKF+15].
Internet graphs are also highly dynamic: large Internet Service Providers (ISPs) field around 109 packets/router/hour [GM12]. Dynamic graphs have wide applications from recommendation systems to cellular networks and require efficient updates to graph storage formats.
According to this paper, there is a Power Law relationship found in dynamic graphs such as internet networks.
Still not sure how this law will help us in implementing dynamic graphs feature for Seastar
Many of the existing formats for sparse data representations on parallel architectures are re-stricted to static data problems, while those for dynamic data suffer from inefficiency both in terms of performance and memory footprint. This work presents Hornet, a novel data representation that targets dynamic data problems. Hornet is scalable with the input size, and does not require any data re-allocation or re-initialization during the data evolution. We show a Hornet implementation for GPU architectures and compare it to the most widely used static and dynamic data structures.
In this work, we presented Hornet, a new GPU data structure for representing dynamic sparse graphs and matrices. Hornet supports both insertions, deletions, and value updates. Unlike past attempts at designing dynamic graph data structures, the proposed solution does not require restarting due to a large number of edge updates. We showed that Hornet outperforms state of-art dynamic graph formats in terms of both performance and memory footprint.
We introduce a new dynamic sparse graph representation called Packed Compressed Sparse Row (PCSR), based on an array- based dynamic data structure called the Packed Memory Array. PCSR is similar to CSR, but leaves spaces between elements, allowing for asymptotically faster insertions and deletions in exchange for a constant factor slowdown in traversals and a constant factor increase in space overhead.
We have implemented PCSR, a dynamic graph storage format based on the packed memory array. We find that for slightly more storage and query time, we are able to achieve similar mutability speeds to the adjacency list. CSR was unable to handle many inserts in a reasonable amount of time. PCSR was orders of magnitude faster for inserts and updates than CSR and adjacency list while maintaining similar graph traversal times.
The growth of social networks and other changing graphs necessitates the need for efficient dynamic graph structures. PCSR is a basic dynamic graph storage format that can fit into existing graph processing frameworks and support fast insertions with comparable traversal times.
Future work includes implementing deletes, other graph algorithms such as shortest-paths, and PCSR in external memory. Additionally, we may be able to speed up PCSR with a parallel implementation.
In the Hornet paper, they have implemented all the operations for a GPU architecture. It would be great if we could venture into GPU territory while implementing anything based on dynamic graphs since Seastar also uses GPU and CUDA programming.
Similarly, found a CSR paper for GPU here and code repository can be found here. Could look into this as well and see if it can be useful to us. Originally in Seastar, CSR format was used, so going with GPMA is a possibility.
Either way, a GPU is being utilized in both and it is recommended that we try using a GPU based implementation for representing dynamic graphs for enhanced performance.
Support for dynamic graphs in Seastar could prove to be useful for the following reasons
More and more real-life graphs are dynamic in nature. Not the just temporal aspect, but the spatio aspect as well. This is prevalent in fields like social-networks, internet topology to name a few. Once implemented, Seastar can have much more use cases.
Since our main project aim was for adding support for temporal graphs in Seastar, we can extend it and generalize it for dynamic graphs as well (spatio + temporal)
Provides us with an opportunity to work with CUDA-GPU programming since there are GPU implementations for data structures for working with dynamic graphs. It would prove to be useful to fully harness the power of GPU, exactly like how Seastar does when compiling Python to CUDA
It presents ourselves with an opportunity to have more code written and changes be made to the Seastar codebase, which can be part of a substantial work done apart from the temporal feature being introduced.
Once implemented, our work in dynamic graph support for a vertex-centric model would be worthy of a paper by the end of our project, We could explain how we were able to integrate the data structures to make it work exactly for Seastar and mention any changes and optimization used by us which enables efficient dynamic graph support by Seastar.
If temporal graph features (tensor_map dictionary) is stored in a stack for each timestamp, something similar can be done for dynamic graphs where the graph representation is stored in the stack for each timestamp. We just need to figure out the way to efficiently store and work with dynamic graph operations. Again once we figure that out and implement it, it is worth mentioning it in the paper we could be planning to write.
Similar to how DGL has a graph.to()
function which creates a CSR representation of a graph and stores it in GPU, we could create our version of it tailored for use by Seastar. Hence we are eliminating any need for DGL. It is also a possibility to implement our own dataset loader as inspired by PyTorch Geometric Temporal. Since these two operations/functions are bottlenecks in our series of operations being done by Seastar, they can be implemented in either Rust, C or C++ to speed up the process. Again this is worthy of a mention in our paper.
It would be nice to indulge in this new feature as we'll be figuring out new Data Structures and Algorithms, CUDA programming and GPU optimizations and figuring out if Seastar can support Dynamic Graphs - which brings it closer to being a novel framework when working with Graph Neural Networks.
Possible change in code generation phase as we will be no longer working with regular CSR.
Whichever method we are using to implement dynamic graph representation (Hornet, GPMA, PPCSR, etc), can be used for static graphs as well and the GPU method used in creating the graph representation can be quicker than building the CSR as originally done by Seastar.
Seastar only works with static graphs so far. While static graphs can provide useful information, they ignore an important dimension: Time. The introduction of the temporal aspect of graphs—where node and edge attributes evolve over time—was the main goal of our project. After successfully implementing and training various temporal graph models like T-GCN with success, we made the decision to try our hand at dealing with dynamic graphs. Graphs like social networks and internet topology are rarely static in the real world. Dynamic graphs have a lot of untapped potential and are more relevant today. Allowing Seastar to work with dynamic graphs will prove to be the next big step in the graph neural networks domain.
With the increasing popularity and relevance in dynamic graphs such as social networks, we believe implementing Seastar to work with such graphs would be a great addition to our final year project and the work done would be worthy of a paper. So far we have implemented the temporal graph feature to Seastar where node and edge features change over time. We can add on to the temporal theme of our project by working on support for dynamic graphs as well. We have done quite a bit of research on the topic and here we present as to why implementing dynamic graph support would a great feature in Seastar.
Here we specify each reason as to why we should take on this new addition to Seastar and elaborate more on it.
All of the interesting things happen in dynamic graphs. They have a lot of potential. So far we have only worked on the temporal changes in graph. Once we implement this new feature, we can work with graphs where the spatial aspect also change with time. The list of potential applications is mind-boggling, from tracking changes in the meanings of words, to spotting cryptocurrency fraud, monitoring toxic online communities and understanding urban transport patterns. This provides Seastar the potential to work with even more broader use cases and not just be confined to static graphs.
Initially we had decided to extend Seastar to support temporal graphs. Generally speaking they are also dynamic graphs, where node and edge features change with time. We could extend to this and stick the theme of our project by providing support for dynamic graphs where the spatial aspect changes with time - nodes and edges get added or deleted with time. This would generalize Seastars support for dynamic graphs since it's not just confined to static graphs.
The heart of Seastar is centered around CUDA/GPU programming. We are introduced to CUDA programming at code generation phase where the logic for the forward propagation of the model is converted from Python to CUDA code and also during code generation for the backward propagation logic. This wouldn't stop there if we decide to go ahead with implementing the dynamic graph feature for Seastar since most of the data structures and algorithms that are needed, fully harness the power of the GPU. The GPU can be used for storing, retrieving and working with dynamic graphs way faster than regular graph implementations as provided by Seastar.
We would have the opportunity to write new code and add it to the Seastar codebase. So far we have written comparatively less code while trying to achieve temporal graph support. If we decide to introduce the dynamic graph feature, we would have more code written for our project. We shall follow the PEP8 coding conventions and make sure to document the entire new module being created so it will be easier not just for us, but for others who want to learn or maybe extend it in the future.
As of now we only have extended Seastar to support temporal graphs. If we could extend Seastar to cover Dynamic Graphs (spatio + temporal), it would be worthy of mentioning it in our final year project paper that we could publish. The following topics can be mentioned and elaborated in our paper:
We will discover more ideas and topics to be included in our paper as we start implementing the new feature.
The following new features can be introduced to Seastar which allows us to implement the dynamic graph feature
More features and changes will be encountered as we make progress in implementing this feature. The above mentioned points are just rough ideas for now.
Seastar would be making a giant leap in the right direction if we were to go ahead an introduce support for Dynamic Graphs in deep learning. We will be figuring out new data structures and algorithms, getting our hands dirty with CUDA programming and GPU optimizations and concluding if Seastar can support Dynamic Graphs - which brings it closer to being a novel framework when working with Graph Neural Networks. The changes being introduced can be reflected for static graphs as well in terms of graph storage and retrievals, hence making Seastar faster and more optimized that it is before.
Run the following after cloning the GPMA GitHub Repository.
python3 preparation.py
to download the pokec
dataset. This will take some time. It will also install CUB.cd cpu_baseline
pokec.txt
and save it inside the cpu_baseline
directory./pma_bfs_demo pokec.txt 0
Running the BFS algorithm took some time. This is the output I got. It need not be the same for everyone because the graph edges are randomised.
node_num: 1632803, edge_num: 30622564
Graph file is loaded.
start from node 0, number of reachable nodes: 1333991
Graph file is updated.
start from node 0, number of reachable nodes: 1333850
Seastar now has support for dynamic graph support through the issues #18 #29
Possibly add feature for Seastar to work with dynamic graphs. Research and more learning to be done before deciding if it's feasible to go ahead with it.