Closed backyes closed 4 years ago
Good questions.
We are trying to parallelize sampling and make it asynchronous with the training process to improve GPU utilization. Reducing sampling time through message fusion can also improve GPU utilization in distributed mode.
Aggregator in core/operator/aggregator is a WIP feature, which will be used to optimize aggregation in distributed training through message fusion, see https://github.com/alibaba/graph-learn/issues/15.
I have rose a pr about Aggregator, fyi @backyes .
@baoleai Hi, is 'parallelizing sampling and make it asynchronous with training' already done or being working in progress? Thx~ And I'm confused about why not using tf.data.Dataset.prefetch to do sampling?I'm a beginner in tensorflow, maybe I have misunderstood this method.
@baoleai Hi, is 'parallelizing sampling and make it asynchronous with training' already done or being working in progress? Thx~ And I'm confused about why not using tf.data.Dataset.prefetch to do sampling?I'm a beginner in tensorflow, maybe I have misunderstood this method.
Have you solved this problem? I am trying to use this method to do sampling
If this is not resolved, GPU will not be fully used in some situations.
Wish better clarification these trouble, thanks a lot.