facebookresearch / GENRE

Autoregressive Entity Retrieval
756 stars 98 forks source link

Fine-tune with hugging face trainer #97

Open SangRyul opened 1 year ago

SangRyul commented 1 year ago


First, Thank you for your great work on the task. I could get many insights from this project. I'm Just wondering

  1. is genre-kilt model in huggingface differ from model in this repository? if so how they are different?

  2. I have my custom document retrieval dataset in kilt style. How can I finetune with hugging face model? I just want to transport with trainer api in huggingface. Can you give me a guide?

  3. I also tried finetuning with this script mine is at below


Copyright (c) Facebook, Inc. and its affiliates.

All rights reserved.


This source code is licensed under the license found in the

LICENSE file in the root directory of this source tree.



DATASET=/userhomes/sangryul/project/contrastive-retrieval/GENRE/data_fair BASED_MODEL=/userhomes/sangryul/project/contrastive-retrieval/GENRE/models/fairseq_wikipage_retrieval NAME=nq_100_finetune STEP=10000

fairseq-train $DATASET/bin/ \ --wandb-project multiperspective \ --no-epoch-checkpoints \ --keep-best-checkpoints 1 \ --save-dir /userhomes/sangryul/project/contrastive-retrieval/GENRE/models/$NAME \ --restore-file $BASED_MODEL/model.pt \ --arch bart_large \ --task translation \ --criterion label_smoothed_cross_entropy \ --source-lang source \ --target-lang target \ --truncate-source \ --label-smoothing 0.1 \ --max-tokens 1024 \ --update-freq 1 \ --max-update $STEP \ --required-batch-size-multiple 1 \ --dropout 0.1 \ --attention-dropout 0.1 \ --relu-dropout 0.0 \ --weight-decay 0.01 \ --optimizer adam \ --adam-betas "(0.9, 0.999)" \ --adam-eps 1e-08 \ --clip-norm 0.1 \ --lr-scheduler polynomial_decay \ --lr 3e-05 \ --total-num-update $STEP \ --warmup-updates 500 \ --num-workers 20 \ --share-all-embeddings \ --layernorm-embedding \ --share-decoder-input-output-embed \ --skip-invalid-size-inputs-valid-test \ --log-format json \ --log-interval 10 \ --patience 200 \

But I found that the training loss is decreasing while evaluating loss is increasing.
 I used Natural question kilt train and dev dataset. and Is this because of overfitting?

Thank you for your effort on this project again.

Thank you very much