awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.27k stars 195 forks source link

dist_train looks for data_files but it is None #187

Closed shriphani closed 3 years ago

shriphani commented 3 years ago

Here is the command I am using:

$ dglke_dist_train --path ~/wikidata --ip_config ~/wikidata/ip_config.txt --num_client_proc 3 --model_name TransE_l2 --dataset Wikidata --data_path ~/wikidata --data_files train.txt valid.txt test.txt --format raw_udd_{hrt} --hidden_dim 400 --gamma 19.9 --lr 0.25 --batch_size 1000 --neg_sample_size 200 --max_step 100 --log_interval 100 --batch_size_eval 16 --test -adv --regularization_coef 1.00E-09 --num_thread 1 --force_sync_interval

Clearly I have a data_files argument passed in. Yet this is the error I get:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dglke/models/pytorch/tensor_models.py", line 77, in decorated_function
    raise exception.__class__(trace)
TypeError: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/dglke/models/pytorch/tensor_models.py", line 65, in _queue_result
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dglke/train_pytorch.py", line 259, in dist_train_test
    dataset_full = dataset = get_dataset(args.data_path, args.dataset, args.format, args.data_files)
  File "/usr/local/lib/python3.8/dist-packages/dglke/dataloader/KGDataset.py", line 518, in get_dataset
    dataset = KGDatasetUDDRaw(data_path, data_name, files, format)
  File "/usr/local/lib/python3.8/dist-packages/dglke/dataloader/KGDataset.py", line 374, in __init__
    for f in files:
TypeError: 'NoneType' object is not iterable

I am using version 0.1.0.

Everything is fine when I use an inbuilt, existing data set.

shriphani commented 3 years ago

The argument order seems to be wrong as well:

  File "/usr/local/lib/python3.8/dist-packages/dglke/train_pytorch.py", line 259, in dist_train_test
    dataset_full = dataset = get_dataset(args.data_path, args.dataset, args.format, args.data_files)
  File "/usr/local/lib/python3.8/dist-packages/dglke/dataloader/KGDataset.py", line 518, in get_dataset
    dataset = KGDatasetUDDRaw(data_path, data_name, files, format)
classicsong commented 3 years ago

For distributed training, we use metis partition to part the input data. It can not work with UDDRaw.