alibaba / GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
https://graphscope.io
Apache License 2.0
3.17k stars 424 forks source link

PyG Remote Backend Based on GraphScope #3739

Open LiSu opened 2 months ago

LiSu commented 2 months ago

GraphScope leverages the distributed GNN training framework, graphlearn-for-pytorch (GLTorch), to facilitate large-scale distributed GNN training. GLTorch is model-layer compatible with PyG and enables the extension of PyG-based GNN training to large distributed graphs.

To address the challenge of training GNNs on graphs that exceed the available memory of a single machine, PyG has introduced a pluggable Remote Backend mechanism. This mechanism, through abstractions like FeatureStore and GraphStore, supports integration with third-party graph storage engines. The FeatureStore permits utilization of node/edge features stored remotely, while the GraphStore facilitates access to graph structure information held externally. This project aims to implement a PyG Remote Backend based on GraphScope for PyG to provide a user-friendly experience for conducting distributed GNN training with GraphScope for PyG users.

Deliverables:

LiSu commented 2 months ago

GraphScope基于分布式GNN训练框架graphlearn-for-pytorch (GLTorch)支持大规模分布式GNN训练。GLTorch在模型层和PyG兼容,支持将PyG GNN训练扩展到分布式大图。为了支持在大于机器可用内存大小的图上训练GNN,PyG引入了一套可插拔的Remote Backend机制,即通过FeatureStore 和 GraphStore等抽象,支持第三方图存储引擎和PyG的对接。其中FeatureStore允许用户利用存储在远程的节点/边特征,GraphStore允许用户利用存储在远程的图结构信息,两者结合支持基于远端存储的GNN训练扩展。本项目旨在通过实现基于Graphscope的PyG Remote Backend,更进一步简化GraphScope和PyG的对接方式,提供对PyG用户友好的基于GraphScope进行分布式GNN训练的产品使用体验。

产出:

难度: 初级 技术要求:熟练使用Python语言,熟悉C++