Closed geo47 closed 1 year ago
yes, you can do exactly as you've described. it remains to be seen how effective this would be, depending on the model used
Thanks for the response..!
Which model would be suitable for this? I am currently using BB2_400M.
any conversational model would be a reasonable choice, bb2_400m is good
Thanks!
When I train the BB2 model on my custom dataset, it trains perfectly on custom dialogues. However, the training on custom dataset effects the BB2 model general responses.
For example: when you give any open dialogues that is not present in my dataset, the general BB2 model can generate a very generalized response and it also learns from the prepend persona information. However, the fine-tuned BB2 model on custom dataset does not generalize well and it also can not learn from the prepend personas information.
Here is an example of the dataset format I use for training the BB2 model:
{"dialog": [[{"id": "partner1", "text": "your persona: I am John\nyour persona: I live in Ohio.\ntell me a joke."}, {"id": "partner2", "text": "One time, I put strawberry jam on my burger. I thought it was ketchup!"}]]}
It seems like model is too much overfitting on the current task after fine-tuning and forget the BB2 default blended_skill_task
features.
Following is the command I use for fine-tuning the model. Do I need to update some parametes for proper training so that model can learn from the custom dataset while preserving the state of pre-trained BB2 model. (might be skipping generation or retrieval part --mutators skip_retrieval)?
parlai train_model -dp data \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task jsonfile --jsonfile-datapath dialogue_dialog_json_task.json \
--num_epochs 16 --batchsize 1 \
--memory-decoder-model-file "" --memory-key full_text \
--search-query-generator-model-file zoo:blenderbot2/query_generator/model --search-query-generator-beam-min-length 2 \
--save-every-n-secs 600 --validation_every_n_secs 600 --log_every_n_secs 60 \
--init-model zoo:blenderbot2/blenderbot2_400M/model --dict-file zoo:blenderbot2/blenderbot2_400M/model.dict \
--datatype train:stream \
--embeddings-scale True --variant prelayernorm --split-lines True --learn-positional-embeddings True \
--n-layers 12 --embedding-size 1024 --ffn-size 4096 --n-heads 16 --n-decoder-layers 12 \
--dict-tokenizer gpt2 --generation-model bart \
--query-model bert_from_parlai_rag \
--rag-model-type token --rag-retriever-type dpr \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--gold-document-titles-key select-docs-titles --insert-gold-docs True \
--beam-min-length 5 --beam-context-block-ngram 3 --beam-block-ngram 3 --beam-block-full-context False --beam-size 3 \
--inference beam --optimizer mem_eff_adam --learningrate 1e-05 --lr-scheduler-patience 1 --model-parallel True \
--knowledge-access-method memory_only \
--truncate 512 --text-truncate 512 --label-truncate 128 \
--dropout 0.0 --attention-dropout 0.0 \
--min-doc-token-length 64 --max-doc-token-length 256 \
--fp16 True --fp16-impl mem_efficient --force-fp16-tokens True \
--model-file model/odkg_model
Thanks!
fine-tuning on new data while preserving knowledge of older capabilities is an open problem in language modeling. One approach would be to incorporate some original training data within the fine-tuned model to prevent this
Would it be helpful if I add blended_skill_task
task along with the new task --task jsonfile --jsonfile-datapath dialogue_dialog_json_task.json
for training the model to preserve the knowledge?
possibly!
Hi @klshuster ,
Thanks for always helping with your feedbacks.
I have another question related to search_server: --search_server 0.0.0.0:8080
.
Is it possible to search from documents instead of a search engine ?
Yes, we offer several ways of doing this, via the --rag-retriever-type
, such as using neural retrieval over document embeddings, of TFIDF over document sets. Please see the relevant README for more details
Hi @klshuster
Is there any length constraints for the input while adding personas and context given below..?
input = "your persona:{persona}\n..\n{user_input} {bot_output}\n...\n{actual input text}"
Only to the extent the bot can fit the tokenized input into it's context; that is determined by the --truncate
/ --text-truncate
flags
if you're curious about bb2, that is 128 tokens (for the 3B model) and 1024 tokens (for the 400m model)
How come 3B = 128 tokens while the 400m = 1024 tokens? I suppose the opposite makes sense.
The real factor is the number of position embeddings the pre-trained models used; the 3B model just so happened to use 128 positions while the 400M used 1024 (3B is based off of BlenderBot 1, 400M based on BART)
Hi @klshuster ,
I understand how the token length is defined for the mentioned models.
I am curious, few days back you told me that there are no length restrictions in terms of defining personas (mentioned here).
Ahh, when I said there were no length restrictions I meant that ParlAI itself could handle any length; the models are bound by their truncation length
Thank you so much for the clarification.
Hi,
Can we prepend a set of dialogue script between user and bot in the text as a seed conversation context. How much will it be effective to generate the relevant response.
Input pattern:
input = "your persona:{persona}\n..\n{user_input} {bot_output}\n...\n{actual input text}"
Thanks!