alibaba / euler

A distributed graph deep learning framework.
Apache License 2.0
2.89k stars 559 forks source link

Any performance numbers? #2

Closed intoraw closed 5 years ago

intoraw commented 5 years ago

Great work! It would be great if some performance numbers(for Node2Vec and LINE) are provided.

renyi533 commented 5 years ago

Do you mean model performance or training QPS? For model performance, you can check at https://github.com/alibaba/euler/wiki/效果测试.

Our framework is optimized for large heterogeneous graphs and distributed training, we have internal performance numbers. We can add this soon after certain internal information sharing process.

For public small graphs, our training speed should be comparable with typical single node solutions.

intoraw commented 5 years ago

@renyi533 Thanks for your reply. Here I mean "training QPS". Looking forward to seeing some training numbers on billion or trillion edges of graphs!

intoraw commented 5 years ago

@renyi533 To the best of my knowledge, it seems that Euler assumes the embeddings of all vertices can be held in the CPU memory (https://github.com/alibaba/euler/blob/c45225119c5b991ca953174f06c2f223562f34c9/tf_euler/python/layers.py#L131).

For large graphs, the embeddings may exceed the CPU memory limit. Is there any way to make Euler be able to handle this case?

yangsiran commented 5 years ago

@pgplus1628 This may be fixed by using partitioned variable.

renyi533 commented 5 years ago

@pgplus1628 @yangsiran Please refer to https://stackoverflow.com/questions/47170879/what-is-partitioner-parameter-in-tensorflow-variable-scope-used-for.

You can define variable partition per variable with tf.get_variable. But this will pass the count of partitions to each inner object, which is not so elegant. That's why we do not use this way.

Instead, you can define default partitioner with tf.variable_scope. So we recommend you to use default partitioner in your main script and place "layer" under it.

Our ppi_main/reddit_main is a simple example. We can write a better example for illustration.

renyi533 commented 5 years ago

@pgplus1628 https://github.com/alibaba/euler/wiki/性能测试 。这里添加了性能测试结果。