alibaba / graph-learn

An Industrial Graph Neural Network Framework
Apache License 2.0
1.28k stars 267 forks source link

Wish clarification about two optimization strategies. #20

Closed backyes closed 4 years ago

backyes commented 4 years ago

If this is not resolved, GPU will not be fully used in some situations.

Wish better clarification these trouble, thanks a lot.

baoleai commented 4 years ago

Good questions.

  1. We are trying to parallelize sampling and make it asynchronous with the training process to improve GPU utilization. Reducing sampling time through message fusion can also improve GPU utilization in distributed mode.

  2. Aggregator in core/operator/aggregator is a WIP feature, which will be used to optimize aggregation in distributed training through message fusion, see https://github.com/alibaba/graph-learn/issues/15.

Seventeen17 commented 4 years ago

I have rose a pr about Aggregator, fyi @backyes .

lorinlee commented 4 years ago

@baoleai Hi, is 'parallelizing sampling and make it asynchronous with training' already done or being working in progress? Thx~ And I'm confused about why not using tf.data.Dataset.prefetch to do sampling?I'm a beginner in tensorflow, maybe I have misunderstood this method.

YijianLiu commented 1 year ago

@baoleai Hi, is 'parallelizing sampling and make it asynchronous with training' already done or being working in progress? Thx~ And I'm confused about why not using tf.data.Dataset.prefetch to do sampling?I'm a beginner in tensorflow, maybe I have misunderstood this method.

Have you solved this problem? I am trying to use this method to do sampling