RUCDM / KB4Rec

This is the data for KB4Rec
322 stars 51 forks source link

out of memory #6

Closed A787 closed 3 years ago

A787 commented 3 years ago

upgan跑main_pretain lastfm loss.backward()报错out of memory 是需要什么配置吗?

RichardHGL commented 3 years ago

EN: Some hyper-parameters may "lead to out of memory", such as gat_split,rs_sample,kg_sample, embedding_size, batch_size. To reduce GPU memory consumption, you may increase gat_split, or decrease rs_sample and kg_sample. I used a server with 18GB GPU to run our model. If problems not solved, I suggest using smaller embedding_size and batch_size instead. Good Luck! I will give the command I used to run UPGAN on music dataset, pretrain share most hyper-parameters.

CN: 可能影响out of memory的设置有gat_split,rs_sample,kg_sample, embedding_size, batch_size,为了减小GPU消耗,可能需要加大gat_split,减小rs_sample和kg_sample. 我使用的是18GB的机器去跑,如果这些调整还是不能跑那么建议你减小embedding_size和batch_size,祝好! 我给出我的样例UPGAN运行参数,pretrain的对应参数是一致的

CUDA_VISIBLE_DEVICES=0 python main_upgan.py --gat_split 10 --model_name UGAT_mlp --G_name generator_concat --data_folder /gaolehe/data/kb_final/ --dataset music --batch_size 4096 --embedding_size 100 --n_epochs 200 --lr 1e-4 --lr_g 1e-4 --decay_rate 0.0 --checkpoint_dir /checkpoint/upgan --n_sample 1024 --n_sample_gen 200 --l2_lambda 1e-5 --rs_sample 50 --rs_sample_flag --kg_sample 10 --kg_sample_flag --eval_every 5 --experiment_name music-upgan-200sample --query_weight --load_ckpt_file good_pretrain/music-ugat-mlp-200sample-norm-emb.ckpt --load_ckpt_G init/music-DistMult-bce-decay.ckpt --norm_emb --reward_type baseline-softmax --lambda_smooth 0.01 --sigma 1.0

A787 commented 3 years ago

thanks