Closed fabrahman closed 2 years ago
Hi there,
The experiments in the linked paper can be replicated somewhat with the BlenderBot 2 model architecture. Specifically, if we take the command for training FiD RAG and do the following, you should be able to use gold retrieved passages:
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
for --model fid
--insert-gold-docs True
--knowledge-access-method search_only
--splitted-chunk-length
to be something larger than the size of the gold passages (in terms of # words)--retriever-debug-index compressed
; this will simply speed up the initial loading (and reduce RAM usage)Message
s with a 3 fields containing the following:
Message
key for the model with --gold-document-key
--gold-sentence-key
--gold-document-titles-key
--n-docs
to be the number of gold documents you're providing for each example.So, if you emit Message
s from your dataset with the following setup:
{'text': <text_input_to_model>, 'labels': [<desired_model_output>], 'gold_docs': [<list_of_gold_documents>], 'gold_sentences': [<list_of_gold_sentences>], 'gold_doc_titles': [<list_of_gold_document_titles>]
You would add the following flags to the FiD RAG command:
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles
Your final command would look like this:
parlai train_model \
--rag-retriever-type dpr --query-model bert_from_parlai_rag \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 5 \
# new args
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task my_custom_task \
--insert-gold-docs true \
--splitted-chunk-length 1000 \
--retriever-debug-index compressed --knowledge-access-method search_only --n-docs 3
@klshuster thank you so much for your prompt response! :) In this regard, I have following questions (appreciate it):
gold_sentences
and gold_doc_titles
in my data as empty lists? (since it's not applicable to my use case)--retriever-debug-index compressed
does not mean that it's doing any retrieval from a compressed dense index, right?Again thanks
Also should I create a new task or am I fine with using --fromfile_datapath
by adding new datasets in the following format:
{'text': <text_input_to_model>, 'labels': [<desired_model_output>], 'gold_docs': [<list_of_gold_documents>], 'gold_sentences': [<list_of_gold_sentences>], 'gold_doc_titles': [<list_of_gold_document_titles>]
--generation-model bart
in the command above) --n-docs
appropriately, it'll "retrieve" but then swap out all the retrieved passages for the gold passages. it's a bit hacky at the moment as it allows one to intersperse retrieved docs and gold docs--fromfile
works totally find if your data is setup appropriately
@klshuster Sorry I ended up creating my task since I was not sure how should I format my data which has list of gold documents (rather than a single string) in the ParlAI Dialog Format to be able to use --fromfile
.
Below is my script for agents.py
. Since I want to load my data locally from disk, I skipped the build
step and I also commented it out from __init__
. I defined additional fields other than text
and labels
in the setup_data
. But how do I make sure it will be used by the training code?
is adding following flags to our command enough?
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles
agents.py:
class SquadTeacher(DialogTeacher):
def __init__(self, opt, shared=None):
self.datatype = opt['datatype']
# build(opt) # NOTE: the call to build here # commneted out as I want to load from local disk
suffix = 'train' if opt['datatype'].startswith('train') else 'dev'
# whatever is placed into datafile will be passed as the argument to
# setup_data in the next section.
opt['datafile'] = os.path.join(opt['datapath'], 'custom_data', suffix + '.jsonl')
self.id = 'custom_task'
super().__init__(opt, shared)
def setup_data(self, path):
# note that path is the value provided by opt['datafile']
print('loading: ' + path)
self.data= [json.loads(item) for item in PathManager.open(path)]
for example in self.data:
content = example["text"]
target = example["labels"]
grounds = example["gold_docs"]
gold_sent = example["gold_sentences"]
gold_doc_title = example["gold_doc_titles"]
yield {"text": content , "labels": target, "gold_docs": grounds, "gold_sentences": gold_sent, "gold_doc_titles": gold_doc_title}, True
yes, that's exactly it - setting those flags will allow the training code to access those fields in the examples you are emitting
@klshuster Thanks.
Using the command, I am facing some errors. Also requirements.txt
did not include transformers
so I installed 4.6.1 version.
error after bart_large model is downloaded:
Traceback (most recent call last):
File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 490, in modelzoo_path
my_module = importlib.import_module(module_name)
File "/anaconda/envs/parl/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'parlai.zoo.bart.bart_large'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 250, in _load_single_fairseq_checkpoint
state = torch.load(
File "/home/t-fbrahman/.local/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/t-fbrahman/.local/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'fairseq'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 498, in modelzoo_path
my_module.download(datapath)
File "/home/t-fbrahman/ParlAI/parlai/zoo/bart/build.py", line 69, in download
ConversionScript.main(**args)
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 127, in main
return cls._run_kwargs(kwargs)
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 92, in _run_kwargs
return cls._run_from_parser_and_opt(opt, parser)
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
return script.run()
File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 128, in run
self.state = self.load_fairseq_checkpoint()
File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 270, in load_fairseq_checkpoint
return self._load_single_fairseq_checkpoint(paths[0])
File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 254, in _load_single_fairseq_checkpoint
raise ModuleNotFoundError(
ModuleNotFoundError: Please install fairseq: https://github.com/pytorch/fairseq#requirements-and-installation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/anaconda/envs/parl/bin/parlai", line 33, in <module>
sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
File "/home/t-fbrahman/ParlAI/parlai/__main__.py", line 14, in main
superscript_main()
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 316, in superscript_main
opt = parser.parse_args(args)
File "/home/t-fbrahman/ParlAI/parlai/core/params.py", line 1166, in parse_args
self._process_args_to_opts(args)
File "/home/t-fbrahman/ParlAI/parlai/core/params.py", line 1126, in _process_args_to_opts
self.opt[each_key] = modelzoo_path(
File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 501, in modelzoo_path
raise ImportError(
ImportError: Could not find pretrained model in parlai.zoo.bart.bart_large or parlai.zoo.bart.build. Please check your spelling and make sure you've pulled from master.
Other libraries seem to be needed but I am not sure which version of them I should install, like fairseq
.
@klshuster Thanks. Using the command, I am facing some errors. Also
requirements.txt
did not includetransformers
so I installed 4.6.1 version.error after bart_large model is downloaded:
Traceback (most recent call last): File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 490, in modelzoo_path my_module = importlib.import_module(module_name) File "/anaconda/envs/parl/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked ModuleNotFoundError: No module named 'parlai.zoo.bart.bart_large' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 250, in _load_single_fairseq_checkpoint state = torch.load( File "/home/t-fbrahman/.local/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/t-fbrahman/.local/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load result = unpickler.load() ModuleNotFoundError: No module named 'fairseq' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 498, in modelzoo_path my_module.download(datapath) File "/home/t-fbrahman/ParlAI/parlai/zoo/bart/build.py", line 69, in download ConversionScript.main(**args) File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 127, in main return cls._run_kwargs(kwargs) File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 92, in _run_kwargs return cls._run_from_parser_and_opt(opt, parser) File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt return script.run() File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 128, in run self.state = self.load_fairseq_checkpoint() File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 270, in load_fairseq_checkpoint return self._load_single_fairseq_checkpoint(paths[0]) File "/home/t-fbrahman/ParlAI/parlai/agents/bart/convert_fairseq_to_parlai.py", line 254, in _load_single_fairseq_checkpoint raise ModuleNotFoundError( ModuleNotFoundError: Please install fairseq: https://github.com/pytorch/fairseq#requirements-and-installation The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/anaconda/envs/parl/bin/parlai", line 33, in <module> sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')()) File "/home/t-fbrahman/ParlAI/parlai/__main__.py", line 14, in main superscript_main() File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 316, in superscript_main opt = parser.parse_args(args) File "/home/t-fbrahman/ParlAI/parlai/core/params.py", line 1166, in parse_args self._process_args_to_opts(args) File "/home/t-fbrahman/ParlAI/parlai/core/params.py", line 1126, in _process_args_to_opts self.opt[each_key] = modelzoo_path( File "/home/t-fbrahman/ParlAI/parlai/core/build_data.py", line 501, in modelzoo_path raise ImportError( ImportError: Could not find pretrained model in parlai.zoo.bart.bart_large or parlai.zoo.bart.build. Please check your spelling and make sure you've pulled from master.
Other libraries seem to be needed but I am not sure which version of them I should install, like
fairseq
.
Installing the recent fairseq library fixed the issue. Thanks.
@klshuster Hi , I realized I am not able to find out where the model is being saved? It is still being trained but I don't find any checkpoint in the /data/<my_custom_task>/
folder either.
Also, I would really appreciate your help if you can provide the generation command as well? I could not find any example in the project page. Not interactive but from a file bulk generation.
Thanks again.
hi there - could you share your command and the beginning of your training logs?
For generation, you can try running parlai eval_model --skip-generation false -o gen/blenderbot
, which will generate responses according to the parameters defined here. Setting --world-logs output_file
will save the generations to a .jsonl file, and --report-filename report_file
will save the final generation statistics
Thanks @klshuster . I used the same command you shared with me (just changed validation-every-n-epochs
and n_docs
):
parlai train_model \
--rag-retriever-type dpr --query-model bert_from_parlai_rag \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.5 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 5 \
# new args
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task Task01 \
--insert-gold-docs true \
--splitted-chunk-length 1000 \
--retriever-debug-index compressed --knowledge-access-method search_only --n-docs 5
So here is the train log. I could not fine --model-file
there.
2021-07-29 23:42:01 2021-07-29 23:41:59,942 INFO | building dictionary first...
2021-07-29 23:42:01 2021-07-29 23:42:00,555 WARNING | your model is being loaded with opts that do not exist in the model you are initializing the weights with: download_path: None,verbose: False,datapath: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data,evaltask: None,eval_batchsize: None,eval_dynamic_batching: None,num_workers: 0,display_examples: False,num_epochs: -1,max_train_time: -1,max_train_steps: -1,log_every_n_steps: 50,validation_every_n_secs: -1,validation_every_n_steps: -1,save_every_n_secs: -1,save_after_valid: False,validation_every_n_epochs: 0.5,validation_max_exs: 1000,short_final_eval: False,validation_patience: 5,validation_metric: ppl,validation_metric_mode: min,validation_cutoff: 1.0,load_from_checkpoint: True,validation_share_agent: False,metrics: default,aggregate_micro: False,tensorboard_log: False,tensorboard_logdir: None,wandb_log: False,wandb_name: None,wandb_project: None,wandb_entity: None,dict_maxexs: -1,dict_include_valid: False,dict_include_test: False,log_every_n_secs: 30.0,mutators: None,candidates: inline,eval_candidates: inline,interactive_candidates: fixed,repeat_blocking_heuristic: True,fixed_candidates_path: None,fixed_candidate_vecs: reuse,encode_candidate_vecs: True,encode_candidate_vecs_batchsize: 256,train_predict: False,cap_num_predictions: 100,ignore_bad_candidates: False,rank_top_k: -1,return_cand_scores: False,use_memories: False,wrap_memory_encoder: False,memory_attention: sqrt,normalize_sent_emb: False,share_encoders: True,learn_embeddings: True,data_parallel: False,reduction_type: mean,polyencoder_type: codes,poly_n_codes: 64,poly_attention_type: basic,poly_attention_num_heads: 4,codes_attention_type: basic,codes_attention_num_heads: 4,generation_model: bart,query_model: bert_from_parlai_rag,rag_model_type: token,thorough: False,n_extra_positions: 0,gold_knowledge_passage_key: checked_sentence,gold_knowledge_title_key: title,rag_retriever_query: full_history,rag_retriever_type: dpr,retriever_debug_index: compressed,n_docs: 5,min_doc_token_length: 64,max_doc_token_length: 256,rag_query_truncate: 512,print_docs: False,path_to_index: zoo:hallucination/wiki_index_compressed/compressed_pq,path_to_dense_embeddings: None,dpr_model_file: zoo:hallucination/bart_rag_token/model,path_to_dpr_passages: zoo:hallucination/wiki_passages/psgs_w100.tsv,retriever_embedding_size: 768,tfidf_max_doc_paragraphs: -1,tfidf_model_path: zoo:wikipedia_full/tfidf_retriever/model,dpr_num_docs: 25,poly_score_initial_lambda: 0.5,polyencoder_init_model: wikito,poly_faiss_model_file: None,regret: False,regret_intermediate_maxlen: 32,regret_model_file: None,indexer_type: compressed,indexer_buffer_size: 65536,compressed_indexer_factory: IVF4096_HNSW128,PQ128,compressed_indexer_gpu_train: False,compressed_indexer_nprobe: 64,hnsw_indexer_store_n: 128,hnsw_ef_search: 128,hnsw_ef_construction: 200,rag_turn_n_turns: 2,rag_turn_marginalize: doc_then_turn,rag_turn_discount_factor: 1.0,interactive_mode: False,t5_model_arch: t5-base,t5_model_parallel: False,t5_dropout: 0.0,t5_generation_config: None,search_query_generator_model_file: None,search_query_generator_inference: greedy,search_query_generator_beam_min_length: 1,search_query_generator_beam_size: 1,search_query_generator_text_truncate: 512,splitted_chunk_length: 1000,doc_chunk_split_mode: word,n_ranked_doc_chunks: 1,doc_chunks_ranker: head,search_server: None,knowledge_access_method: search_only,memory_key: full_text,query_generator_key: full_text,gold_document_key: gold_docs,gold_sentence_key: gold_sentences,gold_document_titles_key: gold_doc_titles,insert_gold_docs: True,memory_extractor_phrase: persona:,retriever_ignore_phrase: persona:,query_generator_ignore_phrase: persona:,query_generator_model_file: zoo:blenderbot2/query_generator/model,query_generator_delimiter:
2021-07-29 23:42:01 ,query_generator_inference: beam,query_generator_beam_size: 1,query_generator_beam_min_length: 2,query_generator_truncate: -1,memory_retriever_truncate: -1,retriever_delimiter:
2021-07-29 23:42:01 ,share_search_and_memory_query_encoder: False,memory_reader_model: None,memory_doc_title_delimiter: / ,memory_writer_model: bert,memory_writer_model_file: zoo:hallucination/multiset_dpr/hf_bert_base.cp,memory_decoder_key: full_text,memory_decoder_ignore_phrase: persona:,memory_decoder_model_file: zoo:blenderbot2/memory_decoder/model,memory_decoder_delimiter:
2021-07-29 23:42:01 ,memory_decoder_beam_size: 3,memory_decoder_beam_min_length: 10,memory_decoder_truncate: -1,memory_decoder_one_line_memories: False
2021-07-29 23:42:01 2021-07-29 23:42:00,555 WARNING | your model is being loaded with opts that differ from the model you are initializing the weights with. Add the following args to your run command to change this:
2021-07-29 23:42:01 --init-opt None --task None --batchsize 1 --attention-dropout 0.1 --model-parallel False --optimizer sgd --learningrate 1 --truncate -1 --text-truncate None --label-truncate None --lr-scheduler-patience 3 --model bart --parlai-home /home/t-fbrahman/ParlAI --dict-loaded False
2021-07-29 23:42:01 2021-07-29 23:42:00,696 WARNING | your model is being loaded with opts that do not exist in the model you are initializing the weights with: download_path: None,verbose: False,datapath: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data,evaltask: None,eval_batchsize: None,eval_dynamic_batching: None,num_workers: 0,display_examples: False,num_epochs: -1,max_train_time: -1,max_train_steps: -1,log_every_n_steps: 50,validation_every_n_secs: -1,validation_every_n_steps: -1,save_every_n_secs: -1,save_after_valid: False,validation_every_n_epochs: 0.5,validation_max_exs: 1000,short_final_eval: False,validation_patience: 5,validation_metric: ppl,validation_metric_mode: min,validation_cutoff: 1.0,load_from_checkpoint: True,validation_share_agent: False,metrics: default,aggregate_micro: False,tensorboard_log: False,tensorboard_logdir: None,wandb_log: False,wandb_name: None,wandb_project: None,wandb_entity: None,dict_maxexs: -1,dict_include_valid: False,dict_include_test: False,log_every_n_secs: 30.0,mutators: None,candidates: inline,eval_candidates: inline,interactive_candidates: fixed,repeat_blocking_heuristic: True,fixed_candidates_path: None,fixed_candidate_vecs: reuse,encode_candidate_vecs: True,encode_candidate_vecs_batchsize: 256,train_predict: False,cap_num_predictions: 100,ignore_bad_candidates: False,rank_top_k: -1,return_cand_scores: False,use_memories: False,wrap_memory_encoder: False,memory_attention: sqrt,normalize_sent_emb: False,share_encoders: True,learn_embeddings: True,data_parallel: False,reduction_type: mean,polyencoder_type: codes,poly_n_codes: 64,poly_attention_type: basic,poly_attention_num_heads: 4,codes_attention_type: basic,codes_attention_num_heads: 4,generation_model: bart,query_model: bert_from_parlai_rag,rag_model_type: token,thorough: False,n_extra_positions: 0,gold_knowledge_passage_key: checked_sentence,gold_knowledge_title_key: title,rag_retriever_query: full_history,rag_retriever_type: dpr,retriever_debug_index: compressed,n_docs: 5,min_doc_token_length: 64,max_doc_token_length: 256,rag_query_truncate: 512,print_docs: False,path_to_index: zoo:hallucination/wiki_index_compressed/compressed_pq,path_to_dense_embeddings: None,dpr_model_file: zoo:hallucination/bart_rag_token/model,path_to_dpr_passages: zoo:hallucination/wiki_passages/psgs_w100.tsv,retriever_embedding_size: 768,tfidf_max_doc_paragraphs: -1,tfidf_model_path: zoo:wikipedia_full/tfidf_retriever/model,dpr_num_docs: 25,poly_score_initial_lambda: 0.5,polyencoder_init_model: wikito,poly_faiss_model_file: None,regret: False,regret_intermediate_maxlen: 32,regret_model_file: None,indexer_type: compressed,indexer_buffer_size: 65536,compressed_indexer_factory: IVF4096_HNSW128,PQ128,compressed_indexer_gpu_train: False,compressed_indexer_nprobe: 64,hnsw_indexer_store_n: 128,hnsw_ef_search: 128,hnsw_ef_construction: 200,rag_turn_n_turns: 2,rag_turn_marginalize: doc_then_turn,rag_turn_discount_factor: 1.0,interactive_mode: False,t5_model_arch: t5-base,t5_model_parallel: False,t5_dropout: 0.0,t5_generation_config: None,search_query_generator_model_file: None,search_query_generator_inference: greedy,search_query_generator_beam_min_length: 1,search_query_generator_beam_size: 1,search_query_generator_text_truncate: 512,splitted_chunk_length: 1000,doc_chunk_split_mode: word,n_ranked_doc_chunks: 1,doc_chunks_ranker: head,search_server: None,knowledge_access_method: search_only,memory_key: full_text,query_generator_key: full_text,gold_document_key: gold_docs,gold_sentence_key: gold_sentences,gold_document_titles_key: gold_doc_titles,insert_gold_docs: True,memory_extractor_phrase: persona:,retriever_ignore_phrase: persona:,query_generator_ignore_phrase: persona:,query_generator_model_file: zoo:blenderbot2/query_generator/model,query_generator_delimiter:
2021-07-29 23:42:01 ,query_generator_inference: beam,query_generator_beam_size: 1,query_generator_beam_min_length: 2,query_generator_truncate: -1,memory_retriever_truncate: -1,retriever_delimiter:
2021-07-29 23:42:01 --init-opt None --task None --batchsize 1 --attention-dropout 0.1 --model-parallel False --optimizer sgd --learningrate 1 --truncate -1 --text-truncate None --label-truncate None --lr-scheduler-patience 3 --model bart --parlai-home /home/t-fbrahman/ParlAI --dict-loaded False
2021-07-29 23:42:01 2021-07-29 23:42:00,696 INFO | Using CUDA
2021-07-29 23:42:01 2021-07-29 23:42:00,699 INFO | loading dictionary from /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model.dict
2021-07-29 23:42:04 2021-07-29 23:42:01,176 INFO | num words = 50264
2021-07-29 23:42:13
2021-07-29 23:42:13 Downloading: 0%| | 0.00/232k [00:00<?, ?B/s]
2021-07-29 23:42:13 Downloading: 78%|███████▊ | 181k/232k [00:00<00:00, 1.55MB/s]
2021-07-29 23:42:13 Downloading: 100%|██████████| 232k/232k [00:00<00:00, 1.91MB/s]
2021-07-29 23:42:13
2021-07-29 23:42:13 Downloading: 0%| | 0.00/466k [00:00<?, ?B/s]
2021-07-29 23:42:13 Downloading: 45%|████▍ | 209k/466k [00:00<00:00, 1.72MB/s]
2021-07-29 23:42:13 Downloading: 100%|██████████| 466k/466k [00:00<00:00, 3.48MB/s]
2021-07-29 23:42:16
2021-07-29 23:42:16 Downloading: 0%| | 0.00/28.0 [00:00<?, ?B/s]
2021-07-29 23:42:16 Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 14.1kB/s]
2021-07-29 23:42:19 2021-07-29 23:42:16,742 WARNING | Creating Index from Index Factory: IVF4096_HNSW128,PQ128
2021-07-29 23:42:22 2021-07-29 23:42:21,841 INFO | Loading index from /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/hallucination/wow_passages/compressed
2021-07-29 23:42:25 2021-07-29 23:42:23,110 INFO | Loaded index of type <faiss.swigfaiss_avx2.IndexIVFScalarQuantizer; proxy of <Swig Object of type 'faiss::IndexIVFScalarQuantizer *' at 0x7f5701685960> > and size 2862
2021-07-29 23:42:25 2021-07-29 23:42:23,270 INFO | Reading data from: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/hallucination/wow_passages/wow_articles.paragraphs.tsv
2021-07-29 23:43:04 Downloading: 100%|██████████| 440M/440M [00:07<00:00, 57.9MB/s]
2021-07-29 23:43:07 Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decode2021-07-29 23:44:10 2021-07-29 23:44:08,945 INFO | Building Memory Decoder from file: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/blenderbot2/memory_decoder/model
2021-07-29 23:44:37 2021-07-29 23:44:35,615 INFO | Total parameters: 732,961,280 (731,781,632 trainable)
2021-07-29 23:44:37 2021-07-29 23:44:35,615 INFO | Loading existing model params from /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model
2021-07-29 23:44:46 2021-07-29 23:44:45,588 WARNING | Detected a fine-tune run. Resetting the optimizer.
2021-07-29 23:44:46 2021-07-29 23:44:45,588 WARNING | Optimizer was reset. Also resetting LR scheduler.
2021-07-29 23:44:46 2021-07-29 23:44:45,589 INFO | Opt:
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | activation: gelu
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | adafactor_eps: '(1e-30, 0.001)'
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | adam_eps: 1e-08
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | add_p1_after_newln: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | aggregate_micro: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | allow_missing_init_opts: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | attention_dropout: 0.0
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | batchsize: 16
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_block_full_context: True
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_block_list_filename: None
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_block_ngram: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_context_block_ngram: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_delay: 30
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_length_penalty: 0.65
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_min_length: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | beam_size: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | betas: '(0.9, 0.999)'
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | bpe_add_prefix_space: None
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | bpe_debug: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | bpe_dropout: None
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | bpe_merge: None
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | bpe_vocab: None
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | candidates: inline
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | cap_num_predictions: 100
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | checkpoint_activations: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | codes_attention_num_heads: 4
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | codes_attention_type: basic
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | compressed_indexer_factory: IVF4096_HNSW128,PQ128
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | compressed_indexer_gpu_train: False
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | compressed_indexer_nprobe: 64
2021-07-29 23:44:46 2021-07-29 23:44:45,590 INFO | compute_tokenized_bleu: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | converting: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | data_parallel: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | datapath: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | datatype: train
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | delimiter: '\n'
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_class: parlai.core.dict:DictionaryAgent
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_endtoken: __end__
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_file: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model.dict
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_include_test: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_include_valid: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_initpath: None
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_language: english
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_loaded: True
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_lower: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_max_ngram_size: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_maxexs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_maxtokens: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_minfreq: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_nulltoken: __null__
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_starttoken: __start__
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_textfields: text,labels
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_tokenizer: gpt2
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dict_unktoken: __unk__
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | display_examples: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | doc_chunk_split_mode: word
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | doc_chunks_ranker: head
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | download_path: None
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dpr_model_file: zoo:hallucination/bart_rag_token/model
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dpr_num_docs: 25
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dropout: 0.1
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | dynamic_batching: None
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | embedding_projection: random
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | embedding_size: 1024
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | embedding_type: random
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | embeddings_scale: False
2021-07-29 23:44:46 2021-07-29 23:44:45,591 INFO | encode_candidate_vecs: True
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | encode_candidate_vecs_batchsize: 256
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | eval_batchsize: None
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | eval_candidates: inline
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | eval_dynamic_batching: None
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | evaltask: None
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | ffn_size: 4096
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | fixed_candidate_vecs: reuse
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | fixed_candidates_path: None
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | force_fp16_tokens: True
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | fp16: True
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | fp16_impl: safe
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | generation_model: bart
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gold_document_key: gold_docs
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gold_document_titles_key: gold_doc_titles
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gold_knowledge_passage_key: checked_sentence
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gold_knowledge_title_key: title
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gold_sentence_key: gold_sentences
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gpu: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | gradient_clip: 0.1
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | hide_labels: False
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | history_add_global_end_token: None
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | history_reversed: False
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | history_size: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | hnsw_ef_construction: 200
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | hnsw_ef_search: 128
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | hnsw_indexer_store_n: 128
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | ignore_bad_candidates: False
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | image_cropsize: 224
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | image_mode: raw
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | image_size: 256
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | indexer_buffer_size: 65536
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | indexer_type: compressed
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | inference: greedy
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | init_model: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | init_opt: arch/bart_large
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | insert_gold_docs: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | interactive_candidates: fixed
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | interactive_mode: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | invsqrt_lr_decay_gamma: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | is_debug: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | knowledge_access_method: search_only
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | label_truncate: 128
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learn_embeddings: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learn_positional_embeddings: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learningrate: 1e-05
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | load_from_checkpoint: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | log_every_n_secs: 30.0
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | log_every_n_steps: 50
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | loglevel: info
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler: reduceonplateau
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_decay: 0.5
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_patience: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_doc_token_length: 256
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_steps: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_time: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_attention: sqrt
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_min_length: 10
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_size: 3
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_delimiter: '\n'
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_ignore_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_model_file: zoo:blenderbot2/memory_decoder/model
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_one_line_memories: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_doc_title_delimiter: ' / '
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_extractor_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_reader_model: None
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_retriever_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model: bert
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model_file: zoo:hallucination/multiset_dpr/hf_bert_base.cp
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | metrics: default
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | min_doc_token_length: 64
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model: projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_file: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_parallel: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | momentum: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | multitask_weights: [1]
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | mutators: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_decoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_docs: 5
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_encoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_extra_positions: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_heads: 16
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_layers: 2
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_positions: 1024
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_ranked_doc_chunks: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_segments: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nesterov: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | no_cuda: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | normalize_sent_emb: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_epochs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_workers: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nus: (0.7,)
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | optimizer: adam
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | init_opt: arch/bart_large
2021-07-29 23:44:46 2021-07-29 23:44:45,592 INFO | insert_gold_docs: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | interactive_candidates: fixed
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | interactive_mode: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | invsqrt_lr_decay_gamma: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | is_debug: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | knowledge_access_method: search_only
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | label_truncate: 128
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learn_embeddings: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learn_positional_embeddings: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | learningrate: 1e-05
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | load_from_checkpoint: True
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | log_every_n_secs: 30.0
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | log_every_n_steps: 50
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | loglevel: info
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler: reduceonplateau
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_decay: 0.5
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_patience: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_doc_token_length: 256
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_steps: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_time: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_attention: sqrt
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_min_length: 10
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_size: 3
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_delimiter: '\n'
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_ignore_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_model_file: zoo:blenderbot2/memory_decoder/model
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_one_line_memories: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_doc_title_delimiter: ' / '
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_extractor_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_reader_model: None
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_retriever_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model: bert
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model_file: zoo:hallucination/multiset_dpr/hf_bert_base.cp
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | metrics: default
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | min_doc_token_length: 64
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model: projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_file: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_parallel: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | momentum: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | multitask_weights: [1]
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | mutators: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_decoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_docs: 5
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_encoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_extra_positions: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_heads: 16
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_layers: 2
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_positions: 1024
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_ranked_doc_chunks: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_segments: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nesterov: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | no_cuda: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | normalize_sent_emb: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_epochs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_workers: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nus: (0.7,)
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler: reduceonplateau
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_decay: 0.5
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | lr_scheduler_patience: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_doc_token_length: 256
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_steps: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | max_train_time: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_attention: sqrt
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_min_length: 10
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_beam_size: 3
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_delimiter: '\n'
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_ignore_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_model_file: zoo:blenderbot2/memory_decoder/model
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_one_line_memories: False
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_decoder_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_doc_title_delimiter: ' / '
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_extractor_phrase: persona:
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_key: full_text
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_reader_model: None
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_retriever_truncate: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model: bert
2021-07-29 23:44:46 2021-07-29 23:44:45,593 INFO | memory_writer_model_file: zoo:hallucination/multiset_dpr/hf_bert_base.cp
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | metrics: default
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | min_doc_token_length: 64
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model: projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_file: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | model_parallel: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | momentum: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | multitask_weights: [1]
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | mutators: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_decoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_docs: 5
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_encoder_layers: 12
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_extra_positions: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_heads: 16
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_layers: 2
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_positions: 1024
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_ranked_doc_chunks: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | n_segments: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nesterov: True
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | no_cuda: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | normalize_sent_emb: False
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_epochs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | num_workers: 0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | nus: (0.7,)
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | optimizer: adam
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | output_scaling: 1.0
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | override: "{'rag_retriever_type': 'dpr', 'query_model': 'bert_from_parlai_rag', 'dpr_model_file': 'zoo:hallucination/bart_rag_token/model', 'generation_model': 'bart', 'init_opt': 'arch/bart_large', 'batchsize': 16, 'fp16': True, 'gradient_clip': 0.1, 'label_truncate': 128, 'log_every_n_secs': 30.0, 'lr_scheduler': 'reduceonplateau', 'lr_scheduler_patience': 1, 'model_parallel': True, 'optimizer': 'adam', 'text_truncate': 512, 'truncate': 512, 'learningrate': 1e-05, 'validation_metric_mode': 'min', 'validation_every_n_epochs': 0.5, 'validation_max_exs': 1000, 'validation_metric': 'ppl', 'validation_patience': 5, 'gold_document_key': 'gold_docs', 'gold_sentence_key': 'gold_sentences', 'gold_document_titles_key': 'gold_doc_titles', 'model': 'projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent', 'task': 'Task01', 'insert_gold_docs': True, 'splitted_chunk_length': 1000, 'retriever_debug_index': 'compressed', 'knowledge_access_method': 'search_only', 'n_docs': 5, 'activation': 'gelu', 'attention_dropout': 0.0, 'dict_file': '/amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model.dict', 'dict_tokenizer': 'gpt2', 'dropout': 0.1, 'embedding_size': 1024, 'embeddings_scale': False, 'ffn_size': 4096, 'force_fp16_tokens': True, 'init_model': 'zoo:bart/bart_large/model', 'learn_positional_embeddings': True, 'n_decoder_layers': 12, 'n_encoder_layers': 12, 'n_heads': 16, 'n_positions': 1024, 'variant': 'bart'}"
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | parlai_home: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | path_to_dense_embeddings: None
2021-07-29 23:44:46 2021-07-29 23:44:45,594 INFO | path_to_dpr_passages: zoo:hallucination/wow_passages/wow_articles.paragraphs.tsv
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | temperature: 1.0
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | tensorboard_log: False
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | tensorboard_logdir: None
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | text_truncate: 512
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | tfidf_max_doc_paragraphs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | tfidf_model_path: zoo:wikipedia_full/tfidf_retriever/model
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | thorough: False
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | topk: 10
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | topp: 0.9
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | train_predict: False
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | truncate: 512
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | update_freq: 1
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | use_memories: False
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | use_reply: label
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | validation_cutoff: 1.0
2021-07-29 23:44:46 2021-07-29 23:44:45,596 INFO | validation_every_n_epochs: 0.5
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_every_n_secs: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_every_n_steps: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_max_exs: 1000
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_metric: ppl
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_metric_mode: min
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_patience: 5
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | validation_share_agent: False
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | variant: bart
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | verbose: False
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | wandb_entity: None
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | wandb_log: False
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | wandb_name: None
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | wandb_project: None
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | warmup_rate: 0.0001
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | warmup_updates: -1
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | weight_decay: None
2021-07-29 23:44:46 2021-07-29 23:44:45,597 INFO | wrap_memory_encoder: False
2021-07-29 23:44:52 2021-07-29 23:44:49,978 INFO | Current ParlAI commit: be73277b65f04a60be60c93abd1ab2f179a54110
2021-07-29 23:44:52 2021-07-29 23:44:51,305 INFO | creating task(s): Task01
2021-07-29 23:44:52 loading: /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/Task01/train.jsonl
2021-07-29 23:45:13 2021-07-29 23:45:13,068 INFO | training...
2021-07-29 23:45:22 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
2021-07-29 23:45:22 To disable this warning, you can either:
2021-07-29 23:45:22 - Avoid using `tokenizers` before the fork if possible
2021-07-29 23:45:22 - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-07-29 23:45:22 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
2021-07-29 23:45:22 To disable this warning, you can either:
2021-07-29 23:45:22 - Avoid using `tokenizers` before the fork if possible
2021-07-29 23:45:22 - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-07-29 23:45:22 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
2021-07-29 23:45:22 To disable this warning, you can either:
2021-07-29 23:45:22 - Avoid using `tokenizers` before the fork if possible
2021-07-29 23:45:22 - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-07-29 23:45:28 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
2021-07-29 23:45:28 To disable this warning, you can either:
2021-07-29 23:45:28 - Avoid using `tokenizers` before the fork if possible
2021-07-29 23:45:28 - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-07-29 23:45:28 /mnt/code/ParlAI/parlai/utils/fp16.py:85: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
2021-07-29 23:45:28 return torch.nn.utils.clip_grad_norm_(params, max_norm)
2021-07-29 23:45:46 2021-07-29 23:45:43,994 INFO | time:31s total_exs:80 total_steps:5 epochs:0.00
2021-07-29 23:45:46 clen clip ctpb ctps ctrunc ctrunclen exps exs fp16_loss_scalar gnorm \
2021-07-29 23:45:46 87.14 1 1426 234.1 0 0 2.626 80 7373 inf
2021-07-29 23:45:46 gpu_mem llen loss lr ltpb ltps ltrunc ltrunclen ppl token_acc \
2021-07-29 23:45:46 .5750 158.7 4.54 1e-05 1761 289 .4125 48.61 93.65 .3241
2021-07-29 23:45:46 token_em total_train_updates tpb tps ups
2021-07-29 23:45:46 0 5 3187 523.2 .1642
That training job successfully finished, but since I could not find where it is saved (I am running on clusters), I started a new job and set --model-file
, but would appreciate if you can help figure this out since the new job also took a while to be finish.
Ahh yea, if the --model-file
is not set, the model won't be saved anywhere unfortunately.
I'll go ahead and update those README commands to include a model file
Hi, I'm very impressed about your Work @klshuster !
I have few questions about your project 'SEA' and paper Internet-Augmented generation.
just like @fabrahman , I tried to train my own FiD - Bart model but from scratch.
in table 5, there are two important models 'WizInt Search engine FiD-Gold' and 'WizInt Search engine FiD'.
I am confused about how those two models are trained.
for 'WizInt Search engine FiD-Gold', I believed that this model prepends WizInt dataset's concatenated line of 'selected-sentences' to dialog history for training.
and for 'WizInt Search Engine FiD' model, I believed that it prepends each of first passage of WizInt dataset's 'retrieved-docs' as a document context in front of each 5 same dialog history for training.
but unfortunately, I can't see any TeacherAgents that utilize WizInt's 'retrieved-docs' for training Search Engine FiD model.
and about WizardDialogGoldKnowledgeTeacher, at first, I thought it might be the right teacher for training FiD-Gold model, but I conclude if we have just one passage of knowledge in context, we don't have to use FiD for generation. (i.e. just like training normal transformer, like Transformer(gold Knowledge) in Table 2 and 3)
So what is right? How can I utilize Wizard of Internet dataset to train Search engine FiD model? (and FiD-Gold model) Can I just use dataset's 'retrieved-docs' to train Search Engine FiD model ?(except query generator) and for training FiD-Gold model, which column should I use? ('selected-sentences' or 'selected-docs')
And For Last, like I mentioned above, I can't see any added arguments to utilize dataset's 'retrieved-docs' or 'selected-docs' column in SearchQuerySearchEngineFidAgent, so If i want to use that columns in ParlAI, should I follow your first reply on this issue? (like using BlenderBot2FidAgent and add --gold-document-key ... to my cl just like that...)
I would really appreciate if you can point out my wrong understanding about this project and helping me to understand this project well. Thanks.
Hi @Bannng, let me clarify a few things for you:
In neither case are the documents directly prepended to the context via the teacher; rather, the FiD model takes care of this via internal handling of retrieval.
Indeed, we did not open-source the search engine fid-gold model. However, as I mentioned above, this can be done with the BlenderBot2FidAgent
via setting --gold-document-key __retrieved-docs__
, and additionally setting the --gold-document-sentences __retrieved-docs__
. I've put up #3897 to allow you to set --gold-document-titles-key __retrieved-doc-titles__
to complete the settings.
@klshuster Thank you for your helps.
Can you provide some pointer/instruction on how to format our own index vectors
and retrieval corpus
and use them instead of "wiki dump" index and "wow passages" you used. Is there any instruction on how we can use our own index and knowledge source to train any of FiD/FiD-RAG, etc.?
Currently, I have my index file (document embeddings) as numpy memory mapping saved on the disk. As far as I understand, in your case It seems to be IndexIVFScalarQuantizers
which is mapped to a tsv file of size 2862 named wow_articles.paragraphs.tsv
, right?
Follow these instructions for generating the index with your own knowledge source. Note that if your index is not too big you can use --indexer-type exact
when building the index.
Once those are generated, you can specify --path-to-index /path/to/index --path-to-dpr-passages /path/to/passages
Thank you for your helps! @klshuster
Your reply has been a great help for me to understand the project.
If what I understand is right,
__retrieved-docs__
to train FiD model.but for __selected-sentences__
or __selected-docs__
, those columns are only for the No knowledge vs. gold knowledge baselines experiments? (Table 2, 3)
Thanks a lot!
"WizInt Search engine FiD" model uses it's query generator during training loop, and utilize the passage that the chosen Search API returns. and for that, API's response (retrieved passages) might be very noisy or irrelevant sometimes because of bad performance of query generator or the quality of API's returned urls. => just like what RAG or DPR-FiD models do "WizInt Search engine FiD-Gold" model uses WizInt's retrieved-docs to train FiD model.
This is indeed correct
but for selected-sentences or selected-docs, those columns are only for the No knowledge vs. gold knowledge baselines experiments? (Table 2, 3)
We did not actually use these fields in the experiments per se, but the values within the fields were used to train the gold knowledge baseline; these baselines were just standard Transformer models (they were trained with --task wizard_of_internet:WizardDialogGoldKnowledgeTeacher
)
@klshuster I suppose when I train the FiD model using the command you shared earlier (pasted below), then at inference time the model should do a regular decoding and not rag-token style, right? The reason I was not certain was because we set --dpr-model-file zoo:hallucination/bart_rag_token/model
and also in the training log the rag_model_type: token
.
Or maybe in the FiD model, since all encoded output are concatenated before passing to the decoder, it acts like rag-token during decoding? I appreciate your comment on this.
parlai train_model \
--rag-retriever-type dpr --query-model bert_from_parlai_rag \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 5 \
# new args
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task my_custom_task \
--insert-gold-docs true \
--splitted-chunk-length 1000 \
--retriever-debug-index compressed --knowledge-access-method search_only --n-docs 3
Yeah, the --dpr-model-file
does not influence decoding; --rag-model-type token
is just used for internal code purposes and does not reflect the decoding scheme either. FiD will decode appropriately
Thanks a lot @klshuster !!!
It really helped me !
Hello @klshuster , I am sorry that I ask so many questions. I really appreciate your kind helps so far.
I was wondering if you can confirm the following command is correct for training a FiD model (bart generator) using our own dataset, index, and knowledge corpus (retrieval passages).
parlai train_model --model fid --task <custom_task> \
--rag-retriever-type dpr --query-model bert \
--dpr-model-file zoo:hallucination/multiset_dpr/hf_bert_base.cp \
--path-to-index <path_to_my_index> \
--path-to-dpr-passage <path_to_my_passages> \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 5 \
Also I had two questions regarding options in args.py.
I realized there is a nice option --gold-knowledge-passage-key
which can be used to compute retrieval metrics. In the WoW , there are both checked_sentence
and checked_passage
keys which are both a dictionary. In my dataset, I have list of gold groundings which I wish to be used for retrieval evaluation. Can I create a key named gold_docs
(as suggested above), which is a List of gold docs (rather than Dict in WoW), then use --gold-document-key
and the --gold-knowledge-passage-key
(along w/ --debug) options appropriately? or should format my input differently?
How is --max-doc-token-length
different from --splitted-chunk-length
and which is used for what?
Thanks again
Yes, that command looks perfect, however make sure to set a --model-file /path/to/model
otherwise your model will not save
--gold-knowledge-passage-key
and --gold-knowledge-title-key
are used for computing the retrieval metrics in rag; in the current implementation, the code expects each key in the observation to be a single string to compare against the retrieved documents; you can see what it looks like via parlai dd -t wizard_of_wikipedia -v
and inspecting the title
and checked_sentence
keys in the output. The passage_r@k
metrics will support a scenario where you just concatenate all of the gold documents, as it checks whether the retrieved doc(s) are present within the gold doc(s). The title_r@k
does not, at the moment. Note that you'll use the two keys specified here, not --gold-document-key
, which is used in the BlenderBot2FidAgent
--max-doc-token-length
refers to the max token length in the concatenation with context and document for FiD encoding for any retrieved documents regardless of source, whereas --splitted-chunk-length
is used for search-engine documents and specifies the length of the chunks used from search results (one could theoretically use more than 1 chunk, if desired). Note that this option is relevant above for the gold document case as the BlenderBot2FidAgent
uses this value for computing chunk-sizes of the gold documents@klshuster perfect! thank you so much for clarification!
@klshuster I realized when I generated dense embeddings and set --num_shards=30
, since my data size was not dividable by 30 , according to this line, it does a floor division and thus eventually num_ids are smaller than actual data size. Probably that line needs to be change to math.ceil(len(rows) / num_shards)
?
I am not sure if that why I am getting the following error when loading my index file? Is there any easy way to fix this issue wo/ having to recreate the index file again? If only ids_<shard_id>
files needed to be expanded manually for the last shard? Or is it something more complicated? I'm also fine with discarding the missing passages but don't know how to fix the index file/embeddings?
NOTE: 8840782
is num of passages but last id in the ids_29
file is 8840759
.
23:22:03 | Using CUDA
23:22:03 | loading dictionary from /home/t-fbrahman/ParlAI/data/models/bart/bart_large/model.dict
23:22:03 | num words = 50264
23:22:06 | Loading index from ./my_index/my_passages
23:25:24 | Loaded index of type <faiss.swigfaiss_avx2.IndexHNSWFlat; proxy of <Swig Object of type 'faiss::IndexHNSWFlat *' at 0x7f53c6987840> > and size 8840782
Traceback (most recent call last):
File "/anaconda/envs/parl/bin/parlai", line 33, in <module>
sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
File "/home/t-fbrahman/ParlAI/parlai/__main__.py", line 14, in main
superscript_main()
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 325, in superscript_main
return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
return script.run()
File "/home/t-fbrahman/ParlAI/parlai/scripts/train_model.py", line 932, in run
self.train_loop = TrainLoop(self.opt)
File "/home/t-fbrahman/ParlAI/parlai/scripts/train_model.py", line 347, in __init__
self.agent = create_agent(opt)
File "/home/t-fbrahman/ParlAI/parlai/core/agents.py", line 479, in create_agent
model = model_class(opt)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/rag.py", line 176, in __init__
self._generation_agent.__init__(self, opt, shared) # type: ignore
File "/home/t-fbrahman/ParlAI/parlai/agents/bart/bart.py", line 72, in __init__
super().__init__(opt, shared)
File "/home/t-fbrahman/ParlAI/parlai/core/torch_generator_agent.py", line 484, in __init__
self.model = fsdp_utils.fsdp_wrap(self.build_model())
File "/home/t-fbrahman/ParlAI/parlai/agents/fid/fid.py", line 198, in build_model
model = FidModel(self.opt, self.dict)
File "/home/t-fbrahman/ParlAI/parlai/agents/fid/fid.py", line 68, in __init__
super().__init__(opt, dictionary, retriever_shared=retriever_shared)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/modules.py", line 82, in __init__
self.retriever = retriever_factory(opt, dictionary, shared=retriever_shared)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 1317, in retriever_factory
return DPRRetriever(opt, dictionary, shared=shared)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 584, in __init__
self.load_index(opt, shared)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 600, in load_index
self.indexer.deserialize_from(index_path, embeddings_path)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/indexers.py", line 248, in deserialize_from
super().deserialize_from(file, emb_path)
File "/home/t-fbrahman/ParlAI/parlai/agents/rag/indexers.py", line 171, in deserialize_from
assert (
AssertionError: Deserialized index_id_to_db_id should match faiss index size
@klshuster I realized when I generated dense embeddings and set
--num_shards=30
, since my data size was not dividable by 30 , according to this line, it does a floor division and thus eventually num_ids are smaller than actual data size. Probably that line needs to be change tomath.ceil(len(rows) / num_shards)
?I am not sure if that why I am getting the following error when loading my index file? Is there any easy way to fix this issue wo/ having to recreate the index file again? If only
ids_<shard_id>
files needed to be expanded manually for the last shard? Or is it something more complicated? I'm also fine with discarding the missing passages but don't know how to fix the index file/embeddings?NOTE:
8840782
is num of passages but last id in theids_29
file is8840759
.23:22:03 | Using CUDA 23:22:03 | loading dictionary from /home/t-fbrahman/ParlAI/data/models/bart/bart_large/model.dict 23:22:03 | num words = 50264 23:22:06 | Loading index from ./my_index/my_passages 23:25:24 | Loaded index of type <faiss.swigfaiss_avx2.IndexHNSWFlat; proxy of <Swig Object of type 'faiss::IndexHNSWFlat *' at 0x7f53c6987840> > and size 8840782 Traceback (most recent call last): File "/anaconda/envs/parl/bin/parlai", line 33, in <module> sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')()) File "/home/t-fbrahman/ParlAI/parlai/__main__.py", line 14, in main superscript_main() File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 325, in superscript_main return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser) File "/home/t-fbrahman/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt return script.run() File "/home/t-fbrahman/ParlAI/parlai/scripts/train_model.py", line 932, in run self.train_loop = TrainLoop(self.opt) File "/home/t-fbrahman/ParlAI/parlai/scripts/train_model.py", line 347, in __init__ self.agent = create_agent(opt) File "/home/t-fbrahman/ParlAI/parlai/core/agents.py", line 479, in create_agent model = model_class(opt) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/rag.py", line 176, in __init__ self._generation_agent.__init__(self, opt, shared) # type: ignore File "/home/t-fbrahman/ParlAI/parlai/agents/bart/bart.py", line 72, in __init__ super().__init__(opt, shared) File "/home/t-fbrahman/ParlAI/parlai/core/torch_generator_agent.py", line 484, in __init__ self.model = fsdp_utils.fsdp_wrap(self.build_model()) File "/home/t-fbrahman/ParlAI/parlai/agents/fid/fid.py", line 198, in build_model model = FidModel(self.opt, self.dict) File "/home/t-fbrahman/ParlAI/parlai/agents/fid/fid.py", line 68, in __init__ super().__init__(opt, dictionary, retriever_shared=retriever_shared) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/modules.py", line 82, in __init__ self.retriever = retriever_factory(opt, dictionary, shared=retriever_shared) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 1317, in retriever_factory return DPRRetriever(opt, dictionary, shared=shared) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 584, in __init__ self.load_index(opt, shared) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/retrievers.py", line 600, in load_index self.indexer.deserialize_from(index_path, embeddings_path) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/indexers.py", line 248, in deserialize_from super().deserialize_from(file, emb_path) File "/home/t-fbrahman/ParlAI/parlai/agents/rag/indexers.py", line 171, in deserialize_from assert ( AssertionError: Deserialized index_id_to_db_id should match faiss index size
I fixed this issue! The issue was also because the passage ids started from 0, while it should have been started from 1. Thanks
Hi @klshuster , I do have a couple of questions (most are double checking) for which I really appreciate your comments:
In the first experiment with gold documents, I assume for the evaluation (generation) command, we again need to include following additional arguments, right?
--gold-document-key gold_docs --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles \
--insert-gold-docs true
wo/ these arguments the command still works but I think it won't use any grounding docs.
Similarly, for fid model with dpr retriever component trained using our own index and knowledge source, shouldn't we include following args during eval_model
:
--rag-retriever-type dpr --query-model bert \
--dpr-model-file zoo:hallucination/multiset_dpr/hf_bert_base.cp \
--path-to-index <path_to_my_index> \
--path-to-dpr-passages <path_to_my_passages> \
--indexer-type exact --n-docs 5
I was able to run this command, but not sure if it's the correct way of doing this?
For the same generation command, how can we save the retrieved docs for each generated output? i.e. , in the above command I specified n-docs=5, so the dpr retrieved 5 docs to generate the output, but how can I have those in my output_file
?
And possibly use --gold-knowledge-passage-key
to compute metrics against checked_sentence
.
But for rag style models where both retriever and generator are trained end-to-end (using own index), during eval_model
, we only need to pass a single --model-file
along with index and passage path? How about the --query-model
? I understand that rag updates query encoder but not context encoder. Can you please comment on this?
Is there any instruction on training/testing a ReGReT model?
Thank you for your help.
model.opt
file (which is in the same directory as the --model-file
value)--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
, then the top documents should get saved to the output_file
; let me know if that's not the case. If you specify the appropriate observation key with during eval (while also setting the --debug
arg) it should compute the retrieval metrics--model-file
--regret True
; if you want to use a different model for the initial round of retrieval/generation, you can set a --regret-model-file
, which is presumably a trained RAG or FiD model
- Perhaps a little bit too much technical information but to be as expressive as possible: those arguments are provided on the model-side, not the task side, so it is not necessary to include during evaluation (however, it doesn't hurt to make sure!)
- These are all, again, arguments provided on the model side, so ParlAI will load them appropriately from the saved
model.opt
file (which is in the same directory as the--model-file
value)- If you're using
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
, then the top documents should get saved to theoutput_file
; let me know if that's not the case. If you specify the appropriate observation key with during eval (while also setting the--debug
arg) it should compute the retrieval metrics- ParlAI saves everything rather smoothly so you can just specify
--model-file
- You can train this by simply setting
--regret True
; if you want to use a different model for the initial round of retrieval/generation, you can set a--regret-model-file
, which is presumably a trained RAG or FiD model
@klshuster Thanks a lot. This is absolutely helpful.
Regarding 3: This is a trained fid model (--model fid
during training). Should I add --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
to eval_model command?
Below is my train and eval command but I don't see any key in myoutput_file
related to top retrieved documents. BTW what is the key I should look for?
train command:
python parlai/scripts/train_model.py \
--model fid --task <my_task> \
--rag-retriever-type dpr --query-model bert \
--dpr-model-file zoo:hallucination/multiset_dpr/hf_bert_base.cp \
--path-to-index <path_to_my_index> \
--path-to-dpr-passages <path_to_my_passages> \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 5 \
--model-file ${OUT_DIR}/model
eval command: (assuming I can remove some of the unnecessary args for future)
python parlai/scripts/eval_model.py \
--skip-generation false \
-o parlai/opt_presets/gen/my_blenderbot.opt \
--model-file $OUT_DIR/model \
--task <my_task> \
-dt test \
-bs 4 \
--world-logs $OUT_DIR/test_beam5_output.jsonl \
--report-filename $OUT_DIR/report.txt \
--rag-retriever-type dpr --query-model bert \
--dpr-model-file zoo:hallucination/multiset_dpr/hf_bert_base.cp \
--path-to-index <path_to_my_index> \
--path-to-dpr-passages <path_to_my_passages> \
--indexer-type exact --n-docs 5
@klshuster If I understand correctly, we can add --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
during eval_model of all models involving retriever, like fid and rag. Please correct me if I am wrong.
I tried adding this argument and I realized top_docs
key appeared in the output_file, but not for all examples. "top_docs" is missing for around 1/3 of examples. I tried this for both a trained fid and rag_token model. Does that mean the retriever could not retrieve any docs and just relied on query to generate output?
Also in the metrics, passage_r@k
is always 1.0 no matter "top_docs" is available at all or not, or "checked_sentence" (concatenated gold docs) appear in top_docs.
Thanks
It was simply not implemented for --model fid
or --model rag
(returning the top documents, that is). I've implemented this in #3931, so once that lands you should be able to see the retrieved documents during eval. Adding --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
for a fid
or rag
model would require several other flags in order to get what you wanted
Hi @klshuster,
Related to what has been discussed here, I was wondering if there is a way to train/eval some of the 'hallucination' project models (like FiD-DPR or FiD-RAG, ...) only with WoW passages. As far as I understand (and correct me if I'm wrong) using a standard code like parlai eval_model -mf zoo:hallucination/bart_fid_dpr/model -t wizard_of_wikipedia
evaluates on the whole wiki index and I wanted to know by setting some command line parameters, this can be limited to WoW passages.
Many thanks in advance
Hi @ELotfi, yes this is indeed possible - if you specify --retriever-debug-index compressed
or --retriever-debug-index exact
, this will load a small index of roughly ~2800 passages comprising all that appear in the WoW dataset.
Hey,
I'm also trying to run FiD with gold retrieved documents. It seems that BB2 requires a lot of memory (I can fit only batch_size=1
with a single A100 GPU). Therefore, I am trying to use multiple GPUs, but it fails.
Specifically, when I execute the following command:
parlai tm --rag-retriever-type dpr --query-model bert_from_parlai_rag --dpr-model-file zoo:hallucination/bart_rag_token/model --generation-model bart --init-opt arch/bart_large --batchsize 2 --gradient-clip 0.1 --label-truncate 128 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 --optimizer adam --text-truncate 512 --truncate 512 --learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.5 --validation-max-exs 1000 --validation-metric ppl --validation-patience 5 --gold-document-key gold_documents --gold-sentence-key gold_sentences --gold-document-titles-key gold_doc_titles --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent --insert-gold-docs true --splitted-chunk-length 1000 --retriever-debug-index compressed --knowledge-access-method search_only --n-docs 2 --task <my_task> --debug --loglevel debug
training seems to work fine, but upon changing tm
to multiprocessing_train
I get the following exception:
Traceback (most recent call last):
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/scripts/multiprocessing_train.py", line 45, in multiprocess_train
return single_train.TrainLoop(opt).train()
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/scripts/train_model.py", line 950, in train
for _train_log in self.train_steps():
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/scripts/train_model.py", line 857, in train_steps
world.parley()
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/core/worlds.py", line 880, in parley
obs = self.batch_observe(other_index, batch_act, agent_idx)
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/core/worlds.py", line 824, in batch_observe
observation = agents[index].observe(observation)
File "/dccstor/knewedge/yuvalk/ParlAI/projects/blenderbot2/agents/blenderbot2.py", line 441, in observe
observation = super().observe(observation)
File "/dccstor/knewedge/yuvalk/ParlAI/parlai/agents/rag/rag.py", line 284, in observe
self._set_query_vec(observation)
File "/dccstor/knewedge/yuvalk/ParlAI/projects/blenderbot2/agents/blenderbot2.py", line 510, in _set_query_vec
observation['query_vec'] = self.model.tokenize_query(query_str)
File "/dccstor/knewedge/yuvalk/anaconda3/envs/parlai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1131, in __getattr__
type(self).__name__, name))
AttributeError: 'DistributedDataParallel' object has no attribute 'tokenize_query'
How do you recommend fine-tuning FiD with gold retrieved documents?
I will appreciate any tip/advice :)
Indeed, BB2 requires around 8 gpus for training the 2.7B model (only 4 for training the 400M model).
I can put up a fix for your specific error, though we have not extensively tested BB2 with multiprocessing
@klshuster I wonder how we can use bart_large_xsum
as the opt presets? Seems like it is not defined. Appreciate your comment :)
we don't currently have the bart_large_xsum
model in ParlAI (assuming you're referring to the huggingface one: https://huggingface.co/facebook/bart-large-xsum)
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.
Hello, Thanks for the great effort. I am new to parlai. I am interested in training a BART FiD model on my custom data using gold retrieved passages instead of using a DPR-style retriever. I understand how to add new dataset from here.
And in the project page here, I see the second to the last command is for training a FiD RAG. Is there a way to modify
RagModel
orFidModel
class to pass gold passages? I saw this recent paper, that they have experiments using Retrieved Gold knowledge.I would appreciate if you can point me to right direction.