nithinmanoj10 commented 1 year ago

Possibly add feature for Seastar to work with dynamic graphs. Research and more learning to be done before deciding if it's feasible to go ahead with it.

nithinmanoj10 commented 1 year ago

Dynamic Graph Review

Dynamic Graphs change a lot? Power Law
Data Structure and Algorithms to use
Will using a GPU (GPMA) prove to be useful?
More papers on Dynamic Graphs
Dynamic Graphs and Neural Networks
Dataset to work on till the end of the project

This list can be expanded to further improve our knowledge about dynamic graphs.

Create a separate comment for each topic and we can add on to that

nithinmanoj10 commented 1 year ago

Dataset to work on till the end of the project

After going through the datasets provided by PyTorch Geometric Temporal, the following dynamic datasets were found

The above mentioned datasets are for Dynamic Graphs with Temporal Signals, i.e, structure and node/edge features of the graph changes over time

nithinmanoj10 commented 1 year ago

Do Dynamic Graphs change a lot and Power Law

In the Introduction section of Packed CSR, it is mentioned that real-life networks are highly dynamic in nature with many additions/deletions/updations being done every second

Social networks such as Facebook and Twitter are highly dynamic graphs since new users and connections are added constantly. Twitter averages about 500 million tweets a day [Say] and Facebook has about 41,000 posts (2.5Mb of data) per second [WKF+15].

Internet graphs are also highly dynamic: large Internet Service Providers (ISPs) field around 109 packets/router/hour [GM12]. Dynamic graphs have wide applications from recommendation systems to cellular networks and require efficient updates to graph storage formats.

Power Law

According to this paper, there is a Power Law relationship found in dynamic graphs such as internet networks.

Still not sure how this law will help us in implementing dynamic graphs feature for Seastar

nithinmanoj10 commented 1 year ago

Data Structures and Algorithms to use

Hornet

Paper can be found here
Code Repository can be found here

Abstract

Many of the existing formats for sparse data representations on parallel architectures are re-stricted to static data problems, while those for dynamic data suffer from inefficiency both in terms of performance and memory footprint. This work presents Hornet, a novel data representation that targets dynamic data problems. Hornet is scalable with the input size, and does not require any data re-allocation or re-initialization during the data evolution. We show a Hornet implementation for GPU architectures and compare it to the most widely used static and dynamic data structures.

Time Complexity Comparison

Conclusion

In this work, we presented Hornet, a new GPU data structure for representing dynamic sparse graphs and matrices. Hornet supports both insertions, deletions, and value updates. Unlike past attempts at designing dynamic graph data structures, the proposed solution does not require restarting due to a large number of edge updates. We showed that Hornet outperforms state of-art dynamic graph formats in terms of both performance and memory footprint.

Packed CSR

Paper can be found here
Code repository can be found here

Abstract

We introduce a new dynamic sparse graph representation called Packed Compressed Sparse Row (PCSR), based on an array- based dynamic data structure called the Packed Memory Array. PCSR is similar to CSR, but leaves spaces between elements, allowing for asymptotically faster insertions and deletions in exchange for a constant factor slowdown in traversals and a constant factor increase in space overhead.

Time Complexity Comparison

Conclusion

We have implemented PCSR, a dynamic graph storage format based on the packed memory array. We find that for slightly more storage and query time, we are able to achieve similar mutability speeds to the adjacency list. CSR was unable to handle many inserts in a reasonable amount of time. PCSR was orders of magnitude faster for inserts and updates than CSR and adjacency list while maintaining similar graph traversal times.

The growth of social networks and other changing graphs necessitates the need for efficient dynamic graph structures. PCSR is a basic dynamic graph storage format that can fit into existing graph processing frameworks and support fast insertions with comparable traversal times.

Future work includes implementing deletes, other graph algorithms such as shortest-paths, and PCSR in external memory. Additionally, we may be able to speed up PCSR with a parallel implementation.

nithinmanoj10 commented 1 year ago

Does using a GPU prove to be useful?

In the Hornet paper, they have implemented all the operations for a GPU architecture. It would be great if we could venture into GPU territory while implementing anything based on dynamic graphs since Seastar also uses GPU and CUDA programming.

Similarly, found a CSR paper for GPU here and code repository can be found here. Could look into this as well and see if it can be useful to us. Originally in Seastar, CSR format was used, so going with GPMA is a possibility.

Either way, a GPU is being utilized in both and it is recommended that we try using a GPU based implementation for representing dynamic graphs for enhanced performance.

nithinmanoj10 commented 1 year ago

Why would this be a good feature?

Support for dynamic graphs in Seastar could prove to be useful for the following reasons

More and more real-life graphs are dynamic in nature. Not the just temporal aspect, but the spatio aspect as well. This is prevalent in fields like social-networks, internet topology to name a few. Once implemented, Seastar can have much more use cases.
Since our main project aim was for adding support for temporal graphs in Seastar, we can extend it and generalize it for dynamic graphs as well (spatio + temporal)
Provides us with an opportunity to work with CUDA-GPU programming since there are GPU implementations for data structures for working with dynamic graphs. It would prove to be useful to fully harness the power of GPU, exactly like how Seastar does when compiling Python to CUDA
It presents ourselves with an opportunity to have more code written and changes be made to the Seastar codebase, which can be part of a substantial work done apart from the temporal feature being introduced.
Once implemented, our work in dynamic graph support for a vertex-centric model would be worthy of a paper by the end of our project, We could explain how we were able to integrate the data structures to make it work exactly for Seastar and mention any changes and optimization used by us which enables efficient dynamic graph support by Seastar.
If temporal graph features (tensor_map dictionary) is stored in a stack for each timestamp, something similar can be done for dynamic graphs where the graph representation is stored in the stack for each timestamp. We just need to figure out the way to efficiently store and work with dynamic graph operations. Again once we figure that out and implement it, it is worth mentioning it in the paper we could be planning to write.
Similar to how DGL has a graph.to() function which creates a CSR representation of a graph and stores it in GPU, we could create our version of it tailored for use by Seastar. Hence we are eliminating any need for DGL. It is also a possibility to implement our own dataset loader as inspired by PyTorch Geometric Temporal. Since these two operations/functions are bottlenecks in our series of operations being done by Seastar, they can be implemented in either Rust, C or C++ to speed up the process. Again this is worthy of a mention in our paper.
It would be nice to indulge in this new feature as we'll be figuring out new Data Structures and Algorithms, CUDA programming and GPU optimizations and figuring out if Seastar can support Dynamic Graphs - which brings it closer to being a novel framework when working with Graph Neural Networks.
Possible change in code generation phase as we will be no longer working with regular CSR.
Whichever method we are using to implement dynamic graph representation (Hornet, GPMA, PPCSR, etc), can be used for static graphs as well and the GPU method used in creating the graph representation can be quicker than building the CSR as originally done by Seastar.

nithinmanoj10 commented 1 year ago

Dynamic Graph Support by Seastar for Deep Learning Purposes

Abstract

Seastar only works with static graphs so far. While static graphs can provide useful information, they ignore an important dimension: Time. The introduction of the temporal aspect of graphs—where node and edge attributes evolve over time—was the main goal of our project. After successfully implementing and training various temporal graph models like T-GCN with success, we made the decision to try our hand at dealing with dynamic graphs. Graphs like social networks and internet topology are rarely static in the real world. Dynamic graphs have a lot of untapped potential and are more relevant today. Allowing Seastar to work with dynamic graphs will prove to be the next big step in the graph neural networks domain.

Introduction

With the increasing popularity and relevance in dynamic graphs such as social networks, we believe implementing Seastar to work with such graphs would be a great addition to our final year project and the work done would be worthy of a paper. So far we have implemented the temporal graph feature to Seastar where node and edge features change over time. We can add on to the temporal theme of our project by working on support for dynamic graphs as well. We have done quite a bit of research on the topic and here we present as to why implementing dynamic graph support would a great feature in Seastar.

Why would this be a good feature?

Here we specify each reason as to why we should take on this new addition to Seastar and elaborate more on it.

Relevance of Dynamic Graphs

All of the interesting things happen in dynamic graphs. They have a lot of potential. So far we have only worked on the temporal changes in graph. Once we implement this new feature, we can work with graphs where the spatial aspect also change with time. The list of potential applications is mind-boggling, from tracking changes in the meanings of words, to spotting cryptocurrency fraud, monitoring toxic online communities and understanding urban transport patterns. This provides Seastar the potential to work with even more broader use cases and not just be confined to static graphs.

Initial Project Aim - Temporal Graph Support

Initially we had decided to extend Seastar to support temporal graphs. Generally speaking they are also dynamic graphs, where node and edge features change with time. We could extend to this and stick the theme of our project by providing support for dynamic graphs where the spatial aspect changes with time - nodes and edges get added or deleted with time. This would generalize Seastars support for dynamic graphs since it's not just confined to static graphs.

Opportunity for CUDA and GPU Programming

The heart of Seastar is centered around CUDA/GPU programming. We are introduced to CUDA programming at code generation phase where the logic for the forward propagation of the model is converted from Python to CUDA code and also during code generation for the backward propagation logic. This wouldn't stop there if we decide to go ahead with implementing the dynamic graph feature for Seastar since most of the data structures and algorithms that are needed, fully harness the power of the GPU. The GPU can be used for storing, retrieving and working with dynamic graphs way faster than regular graph implementations as provided by Seastar.

Expanding the Seastar Codebase

We would have the opportunity to write new code and add it to the Seastar codebase. So far we have written comparatively less code while trying to achieve temporal graph support. If we decide to introduce the dynamic graph feature, we would have more code written for our project. We shall follow the PEP8 coding conventions and make sure to document the entire new module being created so it will be easier not just for us, but for others who want to learn or maybe extend it in the future.

Publishing a paper based on our work

As of now we only have extended Seastar to support temporal graphs. If we could extend Seastar to cover Dynamic Graphs (spatio + temporal), it would be worthy of mentioning it in our final year project paper that we could publish. The following topics can be mentioned and elaborated in our paper:

Use of efficient and powerful data structures and algorithms (DSA) for dynamic graphs
Modifications and optimizations made to the DSA to work well with Seastar
How we can achieve better storage and retrieval of graphs using GPU
Memory efficient storage of each graph snapshot without any redundancy
Benchmarking and tests with other libraries that provide support for dynamic graphs
Possible changes and improvements that can be made in the future

We will discover more ideas and topics to be included in our paper as we start implementing the new feature.

Changes to be made to Seastar

The following new features can be introduced to Seastar which allows us to implement the dynamic graph feature

Seastar Graph Object: Instead of relying on DGL's graph object, we can create a graph object curated for Seastar.
Seastar Dataset Loader: Similar to how PyTorch Geometric Temporal (PyG-T) has a dataset loader, we can also create a custom dataset loader object where the data is stored in a format that is memory efficient and can be used by Seastar with ease. This dataset loader will contain popular datasets as provided by PyG-T.
Dynamic Graph Representation: After quite a few research, we found out about Hornet, GPMA and PPCSR for efficient dynamic graph storage. Each one has its own GPU implementation as well. We shall decide on which to use and go ahead with that.
New CUDA Code Template: Since we are moving away from Seastars original CSR format for graph storage, we have to make changes to the code template used during the code generation phase. It is possible to completely overhaul the use of a template engine and find another way to generate the CUDA code.
Python Speedup with Rust, C or C++: We can write Python modules that are bottlenecks in Rust, C or C++. Seastar can call these modules written in these low-level languages and speed up the entire process.

More features and changes will be encountered as we make progress in implementing this feature. The above mentioned points are just rough ideas for now.

Conclusion

Seastar would be making a giant leap in the right direction if we were to go ahead an introduce support for Dynamic Graphs in deep learning. We will be figuring out new data structures and algorithms, getting our hands dirty with CUDA programming and GPU optimizations and concluding if Seastar can support Dynamic Graphs - which brings it closer to being a novel framework when working with Graph Neural Networks. The changes being introduced can be reflected for static graphs as well in terms of graph storage and retrievals, hence making Seastar faster and more optimized that it is before.

nithinmanoj10 commented 1 year ago

Running CPU version of GPMA

Run the following after cloning the GPMA GitHub Repository.

python3 preparation.py to download the pokec dataset. This will take some time. It will also install CUB.
cd cpu_baseline
Make a copy of pokec.txt and save it inside the cpu_baseline directory
Run the following command to run a BFS demo ./pma_bfs_demo pokec.txt 0

Running the BFS algorithm took some time. This is the output I got. It need not be the same for everyone because the graph edges are randomised.

node_num: 1632803, edge_num: 30622564
Graph file is loaded.
start from node 0, number of reachable nodes: 1333991
Graph file is updated.
start from node 0, number of reachable nodes: 1333850

JoelMathewC commented 1 year ago

Seastar now has support for dynamic graph support through the issues #18 #29

bfGraph / STGraph

🕸️ Dynamic Graph Feature #16

Dynamic Graph Review

Dataset to work on till the end of the project

Do Dynamic Graphs change a lot and Power Law

Power Law

Data Structures and Algorithms to use

Hornet

Abstract

Time Complexity Comparison

Conclusion

Packed CSR

Abstract

Time Complexity Comparison

Conclusion

Does using a GPU prove to be useful?

Why would this be a good feature?

Dynamic Graph Support by Seastar for Deep Learning Purposes

Abstract

Introduction

Why would this be a good feature?

Relevance of Dynamic Graphs

Initial Project Aim - Temporal Graph Support

Opportunity for CUDA and GPU Programming

Expanding the Seastar Codebase

Publishing a paper based on our work

Changes to be made to Seastar

Conclusion

Running CPU version of GPMA