alibaba / graph-learn

An Industrial Graph Neural Network Framework
Apache License 2.0
1.28k stars 267 forks source link

想学习源码 #193

Closed nanyoullm closed 2 years ago

nanyoullm commented 2 years ago

想学习源码,请问应该从哪里看起。 1、目前我对应着论文里的storage、sampling、operator几个部分在core里找,但始终难以串起来; 2、另外分布式相关的也不知道怎么看,比如storage如何把图数据在分布式环境中存储; 官方的文档感觉和论文难以关联起来,分布式的也只看到了一个k8s训练的例子。对于源码学习这块,有人可以指导一下吗

Seventeen17 commented 2 years ago

Hi, Thanks for your interest in GraphLearn!

Here is the overview of GraphLearn system modules.

system_modules

The folders and their corresponding modules and functions are as follows.

Storage: graphlearn/core/graph, which is GraphLearn's distributed in-memory graph storage, where storage is local in-memory graph storage, include NodeStorage and EdgeStorage, when EdgeStorage is stored as adjacency table; Graph and Noder encapsulate the graph storage. GraphStore is the access portal for all Graphs and Noders of the whole graph.

Operator: graphlearn/core/operator, is GraphLearn's graph operator implementation, including graph traversal, sampling, negative sampling, graph loading, etc. Operator instances are managed and created through OpFactory. Operator access Local and remote graph data through GraphStore's interface , input as OpRequest, execution result encapsulated as OpResponse.

Runner: graphlearn/core/runner, a distributed execution runtime of GraphLearn. DagScheduler schedules operators on the graph in topological order, executes operators concurrently between operators, and between multiple sampled iterations. DagNodeRunner constructs Operator , OpRequest, OpResponse, and call operator execution.

Dag: graphlearn/core/dag, GraphLearn's graph sampling interface expressed through Graph Sampling Language(GSL), a GSL Query contains multiple sampling operators,Dag is the logical execution plan of the Query. DagNode is the sampling operator, and DagEdge is the input-output relationship between the operators.

Partitioner: graphlearn/core/partition, is GraphLearn's distributed graph partitioning module, GraphLearn divides graph data into multiple GraphLearn Servers according to the strategy of Partitioner. When the data in an OpRequest is distributed in multiple GraphLearn Servers, the request is divided by Partitioner, executed by sending it to multiple Servers, receiving the results of partitions and then merge through Stitcher.

Service: graphlearn/service. rpc access between each Server of GraphLearn, Service is the rpc base class of Protobuf framework, ServiceImpl is its subclass, containing a number of specific implementations of Handler; Server side starts through grpc; Executor is a specific response to a request execution body, through which the Service framework and functional modules isolated; ChannelManager is responsible for the creation and management of the Channel. Client is the initiator of the request, including InMemoryClient to initiate local requests, RpcClient (o initiate remote requests. Tensor is GraphLearn storage and transmission of data structure, the underlying implementation is protobuf.

YijianLiu commented 1 year ago

想学习源码,请问应该从哪里看起。 1、目前我对应着论文里的storage、sampling、operator几个部分在core里找,但始终难以串起来; 2、另外分布式相关的也不知道怎么看,比如storage如何把图数据在分布式环境中存储; 官方的文档感觉和论文难以关联起来,分布式的也只看到了一个k8s训练的例子。对于源码学习这块,有人可以指导一下吗

可以交流一下~