Closed zhaoguangxiang closed 3 years ago
Hi, I have updated baselines to include the scripts for doc transformer. Please check and hope it can help.
python train.py $bin_path --save-dir $cp_path --tensorboard-logdir $cp_path --seed 555 --fp16 --num-workers 4 \ --task translation_doc --source-lang $slang --target-lang $tlang --langs $doc_langs \ --arch transformer_doc_base --doc-mode full --share-all-embeddings \ --optimizer adam --adam-betas "(0.9, 0.98)" --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --no-epoch-checkpoints \ --max-tokens 4096 --update-freq 1 --validate-interval 1 --patience 10 > $run_path/train.$data.$slang-$tlang.log 2>&1
Is it also a doc-to-doc Transformer, what is the difference between this script and the script in /baselines ?
Basically, the sent Transformer used for finetuning G-Transformer is same as the one in the baselines. But for the purpose of finetuning, we need to keep the Dictionary consistent with G-Transformer, that we train the sent Transformer using the G-Transformer's Dictionary which includes the special token \<s> and \<\/s>.
Basically, the sent Transformer used for finetuning G-Transformer is the same as the one in the baselines. But for the purpose of finetuning, we need to keep the Dictionary consistent with G-Transformer, that we train the sent Transformer using the G-Transformer's Dictionary which includes the special token
and.
Does the sent transformer "--doc-mode full" means that it treats the full document as a sequence? What is the special meaning of the doc transformer and what is its difference from the above-sent transformer. I thought that sent transformer is to translate a document one sentence by one sentence
I appreciate your effort to provide scripts for the baseline of the sent transformer. https://github.com/baoguangsheng/g-transformer/issues/2#issue-928802403
the doc transformer is important too.