GPU 0 use a lot of memory

when I use multi-GPU training, I found GPU 0 used a log of memory. why? in this way, I cannot use a big batch size or large hidden size due to GPU oom.

DGLBACKEND=pytorch dglke_train --model_name TransE_l2 --data_path ./data/360KG_V3/ --format udd_hrt --dataset 360KG --data_files entity2id.txt relation2id.txt train.txt valid.txt test.txt --save_path ./run-exp/360KG --max_step 32000 --batch_size 1000 --batch_size_eval 16 --neg_sample_size 200 --log_interval 100 --hidden_dim 256 --gamma 19.9 --lr 0.05 --regularization_coef 1.00E-09 --test -adv --mix_cpu_gpu --num_proc 20 --num_thread 10 --rel_part --force_sync_interval 1000 --gpu 0 1 2 3 --regularization_coef 1e-9 --neg_sample_size_eval 10000 --no_eval_filter --async_update

awslabs / dgl-ke

GPU 0 use a lot of memory #167