awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.28k stars 196 forks source link

Add documents for command line arguments. #83

Closed zheng-da closed 4 years ago

zheng-da commented 4 years ago

We need to explain the arguments of commands.

AlexMRuch commented 4 years ago

I'd really appreciate this. For example, on https://aws-dglke.readthedocs.io/en/latest/train_user_data.html It's not super clear what should be in --data_path and --data_files.

For example, --data_path says "to specify the path to the knowledge graph dataset"; however, I presume this means "to specify the path to the folder containing the knowledge graph dataset".

Also, --data_files says "to specify the triplets of a knowledge graph as well as node/relation ID mapping"; however, it's not immediately clear the order of these files. For example, I would presume this would follow the order of the files listed under udd_[h|r|t]:

DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv train.tsv valid.tsv test.tsv \
--format udd_hrt \
--model_name ComplEx \
--max_step 12000 --batch_size 1000 --neg_sample_size 200 --batch_size_eval 16 \
--hidden_dim 400 --gamma 19.9 --lr 0.25 --regularization_coef=1e-9 -adv \
--gpu 0 1 --async_update --force_sync_interval 1000 --log_interval 1000 \
--test

^^^ But the order isn't clear. It seems like entities.txt and relations.tsv should go at the end since if someone uses to raw_udd_[h|r|t] option this would keep the first three elements consistently for training, validation, and testing files.

Perhaps there should be --data_tuple_files and --data_mapping_files options?

UPDATE: When I ran the code above, it gave me this output with FB_15k in the checkpoints, which doesn't seem right...

(dglke) amruch@wit:~/graphika/kg$ DGLBACKEND=pytorch dglke_train --data_path results_SXSW_2018 --data_files entities.tsv relations.tsv train.tsv valid.tsv test.tsv--format udd_hrt --model_name ComplEx --max_step 12000 --batch_size 1000 --neg_sample_size 200 --batch_size_eval 16 --hidden_dim 400 --gamma 19.9 --lr 0.25 --regularization_coef=1e-9 -adv --gpu 0 1 --async_update --force_sync_interval 1000 --log_interval 1000 --test
Using backend: pytorch
Logs are being recorded at: ckpts/ComplEx_FB15k_0/train.log
Reading train triples....
zheng-da commented 4 years ago

Thank you very much for your feedback. We'll prioritize it and provide documentation of the argument options.

If you find the explanation from --help isn't clear, please post them here. We'll improve them. Thanks a lot for your help.

AlexMRuch commented 4 years ago

This is great! I didn’t know that was an option. I tried man dglke_train and didn’t see anything, but the output I’m seeing from -- help looks great!

zheng-da commented 4 years ago

We need to clarify our documentation to address all of the questions in this issue: https://github.com/awslabs/dgl-ke/issues/84

classicsong commented 4 years ago

The docs for command line arguments was updated along with 0.1.1 release.

AlexMRuch commented 4 years ago

Thanks for the heads up!

On Aug 26, 2020, at 10:15 PM, xiang song(charlie.song) notifications@github.com wrote:

The docs for command line arguments was updated along with 0.1.1 release.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/awslabs/dgl-ke/issues/83#issuecomment-681301582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIYWOJYQDMPPLHP3K6WRXTSCW6VRANCNFSM4MP3SRKA.