awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.28k stars 196 forks source link

dglke_dist_train OSError: [Errno 99] Cannot assign requested address #221

Closed AdrianKs closed 3 years ago

AdrianKs commented 3 years ago

I am trying to run the basic distributed training example. Unfortunately, I always encounter the following error:

  File "miniconda3/envs/dgl/bin/dglke_dist_train", line 8, in <module>
    sys.exit(main())
  File "miniconda3/envs/dgl/lib/python3.7/site-packages/dglke/dist_train.py", line 202, in main
    launch(args)
  File "miniconda3/envs/dgl/lib/python3.7/site-packages/dglke/dist_train.py", line 174, in launch
    if is_local(ip) == False:
  File "miniconda3/envs/dgl/lib/python3.7/site-packages/dglke/dist_train.py", line 71, in is_local
    if ip_addr in local_ip4_addr_list():
  File "miniconda3/envs/dgl/lib/python3.7/site-packages/dglke/dist_train.py", line 63, in local_ip4_addr_list
    struct.pack('256s', name[:15].encode("UTF-8")))[20:24])
OSError: [Errno 99] Cannot assign requested address

This even happens if I just use a local ip:

127.0.0.1 30050 8

I am using dglke version 0.1.2

Do you have an idea, how I could resolve this issue?

classicsong commented 3 years ago

Can you try install from source?

AdrianKs commented 3 years ago

when I install from source it installs version 0.1.0.dev and I run into the following error: Namespace' object has no attribute 'has_edge_importance

classicsong commented 3 years ago

Can you try master?

AdrianKs commented 3 years ago

This is on the master branch.

classicsong commented 3 years ago

I c, you are runing distributed training. Unfortunately, the edge importance is not supported in distributed training.

AdrianKs commented 3 years ago

yes, I am running distributed training. But I am not using edge importance. The problem seems to be that an argparser is defined in kvserver without the attribute has_edge_importance. This parser is then used to create the model here: https://github.com/awslabs/dgl-ke/blob/master/python/dglke/kvserver.py#L120

The model then tries to access args.has_edge_importance which is not specified in the parser. Similar thing happens for loss_genre and neg_adversarial_sampling

AdrianKs commented 3 years ago

Additionally, while the original OSError: [Errno 99] Cannot assign requested address seems to be fixed on the master branch here, the same problem (same function) remains in DGL itself: https://github.com/dmlc/dgl/blob/master/python/dgl/contrib/dis_kvstore.py#L1258