Open haaappy opened 1 year ago
This is because the coordinator crashes, then it tries to restart, but it shouldn't try to pull resources again, causing the exists
error.
The reason for the failure of add_edge
is most likely insufficient memory. What's your startup configuration for the session? And what's the resource spec of your k8s cluster?
This is because the coordinator crashes, then it tries to restart, but it shouldn't try to pull resources again, causing the
exists
error.The reason for the failure of
add_edge
is most likely insufficient memory. What's your startup configuration for the session? And what's the resource spec of your k8s cluster?
The edge data is almost 1G,and we set the session memory param all large than 16G. K8s cluster has 250G memory each node.
The edge data is almost 1G, and we set the session memory param all large than 16G. K8s cluster has 250G memory each node.
It's weird if the memory is enough. I would like to reproduce that. Could you provide the python scripts that can reproduce this error? I could find the dataset by myself.
The edge data is almost 1G, and we set the session memory param all large than 16G. K8s cluster has 250G memory each node.
It's weird if the memory is enough. I would like to reproduce that. Could you provide the python scripts that can reproduce this error? I could find the dataset by myself.
I just create the session. And process the dataset in data frame without attributes. Including v_df:vid, e_df:from_id,to_id.
G=session.g()
Graph= G.add_vertices(v_df).add_edges(e_df)
I got it, you load the graph by dataframe but not read through files. The read from dataframe is served as a convenient way to load small chunks, it doesn't performs well in loading large data.
Could you read the file from file instead? I believe that could solve the problem. You could bind a volume so that you could mount a host path to pods.
Reference: https://graphscope.io/docs/deployment/deploy_graphscope_on_self_managed_k8s#mount-volumes
I got it, you load the graph by dataframe but not read through files. The read from dataframe is served as a convenient way to load small chunks, it doesn't performs well in loading large data.
Could you read the file from file instead? I believe that could solve the problem. You could bind a volume so that you could mount a host path to pods.
Reference: https://graphscope.io/docs/deployment/deploy_graphscope_on_self_managed_k8s#mount-volumes
Thank you, I will try to load the graph from file. By the way, I try to test the session _param “num_workers”. I run the pagerank algorithm on dataset twitter.e. I set the num_workers 1,2,4. However the more workers, the slower the program runs. It seems that algorithm can’t run parallely.
On hosts or k8s mode?
If in hosts mode, grape_engine will try to utilize the number of std::hardware_concurrency()
threads, if you have 4 workers, it will incur some overhead by performing communication between processes, so it's possible that on hosts mode, within single machine, more workers makes it slower.
On hosts or k8s mode? If in hosts mode, grape_engine will try to utilize the number of
std::hardware_concurrency()
threads, if you have 4 workers, it will incur some overhead by performing communication between processes, so it's possible that on hosts mode, within single machine, more workers makes it slower.
I test the workers on both hosts and k8s mode. And get the same result. So I feel puzzled. I test the k8s mode in official docker image. If I want to parallel, I need to set something else in addition to this param?
I'll try to reproduce that in k8s mode. 😂
Ok, thank you🤝
Could you please give me your testing script if convenient?
Related to #2898
Confirmed the performance degradation on a dataset with 12983637 edges, A simple test results are: | local worker | time | k8s worker | time |
---|---|---|---|---|
1 | 0.271543 | 1 | 0.246835 | |
2 | 0.556023 | 2 | 0.693073 | |
4 | 0.992 | 4 | 0.946844 |
Describe the bug We tested page_rank on datagen-7_5-fb.e dataset using graphscope:0.21.0 k8s mode, but it failed in add_edegs.
To Reproduce Steps to reproduce the behavior:
Expected behavior create graph successful.
Screenshots We found the logs in coordinator pod in k8s. During 'add_edges', the logs show that craete engine headless services .. ... ... kubernetes.client.exceptions.ApiException:(409) Reson:Conflict ..... services 'gs-engine-onmitz-headless' already exists.
At last, coordinator pod did not work.
Environment (please complete the following information):
Additional context We tested dataset 'twitter.e' on k8s mode, all is ok. We tested dataset 'datagen-7_5-fb.e' on hosts mode, all is ok. Maybe big dataset didnot work on k8s mode? (graphscope 0.21.0)