alibaba / graph-learn

An Industrial Graph Neural Network Framework
Apache License 2.0
1.28k stars 267 forks source link

Current Version whether support Caching Neighbors of Important Vertices #237

Closed YijianLiu closed 1 year ago

YijianLiu commented 1 year ago

Hello, 1 year ago I see it doesn't support of caching neighbors. Whether current version support this function, if not, what are the main difficulties? Urgently need your answer, thanks a lot!

Seventeen17 commented 1 year ago

We implement batch cache in this version, by prefetching neighbors and features for several batches and caching them in a TapeStore; For computation, the sampling and feature extraction operators are organized as DAG, and the computation is executed in parallel inter- and intra- DAGs, which substantially improves the sampling performance.

The batch cache achieves the desired performance in most of our sampling scenarios. If the capacity of the cache is sufficient and sampling is still a performance bottleneck situation, adding hotspot cache can also orthogonally improve the sampling performance.

YijianLiu commented 1 year ago

We implement batch cache in this version, by prefetching neighbors and features for several batches and caching them in a TapeStore; For computation, the sampling and feature extraction operators are organized as DAG, and the computation is executed in parallel inter- and intra- DAGs, which substantially improves the sampling performance.

The batch cache achieves the desired performance in most of our sampling scenarios. If the capacity of the cache is sufficient and sampling is still a performance bottleneck situation, adding hotspot cache can also orthogonally improve the sampling performance.

According to my understanding, it means that sampling and training are in parallel? When the first batch is training, it begins to prefetch next batch of neighbors and features?

Seventeen17 commented 1 year ago

Yeah

YijianLiu commented 1 year ago

Yeah

Hello, I have a question that why not use pytorch DataLoader workers or tensorflow tf.data.Dataset.prefetch, I think both of them can also achieve prefetch for batches, and why you achieve again by c++

LiSu commented 1 year ago

As graphlearn only uses Pytorch for training, it needs its own dataloader to transform the sampling results into the formats that the trainer requires.