facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
756 stars 98 forks source link

Fine-tune with hugging face trainer #97

Open SangRyul opened 1 year ago

SangRyul commented 1 year ago

Hello.

First, Thank you for your great work on the task. I could get many insights from this project. I'm Just wondering

  1. is genre-kilt model in huggingface differ from model in this repository? if so how they are different?

  2. I have my custom document retrieval dataset in kilt style. How can I finetune with hugging face model? I just want to transport with trainer api in huggingface. Can you give me a guide?

  3. I also tried finetuning with this script mine is at below

    
    #!/bin/bash

Copyright (c) Facebook, Inc. and its affiliates.

All rights reserved.

#

This source code is licensed under the license found in the

LICENSE file in the root directory of this source tree.

DATASET=$1

NAME=$2

DATASET=/userhomes/sangryul/project/contrastive-retrieval/GENRE/data_fair BASED_MODEL=/userhomes/sangryul/project/contrastive-retrieval/GENRE/models/fairseq_wikipage_retrieval NAME=nq_100_finetune STEP=10000

fairseq-train $DATASET/bin/ \ --wandb-project multiperspective \ --no-epoch-checkpoints \ --keep-best-checkpoints 1 \ --save-dir /userhomes/sangryul/project/contrastive-retrieval/GENRE/models/$NAME \ --restore-file $BASED_MODEL/model.pt \ --arch bart_large \ --task translation \ --criterion label_smoothed_cross_entropy \ --source-lang source \ --target-lang target \ --truncate-source \ --label-smoothing 0.1 \ --max-tokens 1024 \ --update-freq 1 \ --max-update $STEP \ --required-batch-size-multiple 1 \ --dropout 0.1 \ --attention-dropout 0.1 \ --relu-dropout 0.0 \ --weight-decay 0.01 \ --optimizer adam \ --adam-betas "(0.9, 0.999)" \ --adam-eps 1e-08 \ --clip-norm 0.1 \ --lr-scheduler polynomial_decay \ --lr 3e-05 \ --total-num-update $STEP \ --warmup-updates 500 \ --num-workers 20 \ --share-all-embeddings \ --layernorm-embedding \ --share-decoder-input-output-embed \ --skip-invalid-size-inputs-valid-test \ --log-format json \ --log-interval 10 \ --patience 200 \



But I found that the training loss is decreasing while evaluating loss is increasing.
 I used Natural question kilt train and dev dataset. and Is this because of overfitting?

Thank you for your effort on this project again.

Thank you very much