OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.39k stars 248 forks source link

question about how to use topp sampling? #358

Open zwkkk opened 1 year ago

zwkkk commented 1 year ago

when trying task gigaword, i have the bug below:

UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size ../aten/src/ATen/native/cuda/MultinomialKernel.cu:214: sampleMultinomialOnce: block: [4,0,0], thread: [0,0,0] Assertion sum > accZero failed.

my code: python3 -m torch.distributed.launch --nproc_per_node=${GPUS_PER_NODE} --master_port=${MASTER_PORT} ../../evaluate.py \ ${data} \ --path=${path} \ --user-dir=${user_dir} \ --bpe=bert \ --task=gigaword \ --batch-size=16 \ --log-format=simple --log-interval=10 \ --seed=7 \ --gen-subset=${split} \ --results-path=${result_path} \ --sampling \ --sampling-topk 10 \ --sampling-topp 0.7 \ --beam=6 \ --lenpen=0.7 \ --max-len-b=32 \ --no-repeat-ngram-size=3 \ --fp16 \ --num-workers=0 \ --model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\"}"

JustinLin610 commented 1 year ago

For what reason you consider about using topp sampling? For this repo, we do not have relevant experience. Perhaps it is still better to use beam search following our practice to get a good result.