facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

Very low performance from blenderbot2_400M #4557

Closed Wonder1905 closed 2 years ago

Wonder1905 commented 2 years ago

Bug description I'm using pretrained blenderbot2_400M model, and it shows very low performance for example:

Enter Your Message: Hi I'm David , What is your name?

[BlenderBot2Fid]: My name is David and I am a man. What is yours? _POTENTIALLY_UNSAFE__

Enter Your Message: we have the same name, that's strange 

I can't continue because I have another bug (will open in another ticket), but from previous chit chats, after a few massages I ask the bot:

Me:"Do you remember my name?"

Answer: "Yes"

Me: "What is my name?"

Answer:"I dont remeber your name"

Reproduction steps

from parlai.scripts.interactive import Interactive
Interactive.main(
    # the model_file is a filename path pointing to a particular model dump.
    # Model files that begin with "zoo:" are special files distributed by the ParlAI team.
    # They'll be automatically downloaded when you ask to use them.
    model_file='zoo:blenderbot2/blenderbot2_400M/model',
    search_server="0.0.0.0:1111"
)

Expected behavior I'm trying to understand if what I got is the expected behavior, since it contradicts everything that is written in this repo. Logs


06:24:02 | Overriding opt["model_file"] to /content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model (previously: /checkpoint/kshuster/projects/knowledge_bot/kbot_memfix_sweep25_Fri_Jul__9/338/model.oss)
06:24:02 | Overriding opt["search_server"] to 0.0.0.0:1111 (previously: None)
06:24:02 | Using CUDA
06:24:02 | loading dictionary from /content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model.dict
06:24:02 | num words = 50264
06:24:02 | BlenderBot2Fid: full interactive mode on.
06:24:27 | Creating the search engine retriever.
06:24:27 | No protocol provided, using "http://"
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
06:24:53 | Building Query Generator from file: /content/ParlAI/data/models/blenderbot2/query_generator/model
06:25:05 | Building Memory Decoder from file: /content/ParlAI/data/models/blenderbot2/memory_decoder/model
06:25:14 | Total parameters: 732,961,280 (406,286,336 trainable)
06:25:14 | Loading existing model params from /content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model
06:25:17 | Opt:
06:25:17 |     activation: gelu
06:25:17 |     adafactor_eps: '[1e-30, 0.001]'
06:25:17 |     adam_eps: 1e-08
06:25:17 |     add_cleaned_reply_to_history: False
06:25:17 |     add_p1_after_newln: False
06:25:17 |     allow_missing_init_opts: False
06:25:17 |     attention_dropout: 0.1
06:25:17 |     batchsize: 12
06:25:17 |     beam_block_full_context: False
06:25:17 |     beam_block_list_filename: None
06:25:17 |     beam_block_ngram: 3
06:25:17 |     beam_context_block_ngram: 3
06:25:17 |     beam_delay: 30
06:25:17 |     beam_length_penalty: 0.65
06:25:17 |     beam_min_length: 20
06:25:17 |     beam_size: 10
06:25:17 |     betas: '[0.9, 0.999]'
06:25:17 |     bpe_add_prefix_space: None
06:25:17 |     bpe_debug: False
06:25:17 |     bpe_dropout: None
06:25:17 |     bpe_merge: None
06:25:17 |     bpe_vocab: None
06:25:17 |     candidates: inline
06:25:17 |     cap_num_predictions: 100
06:25:17 |     checkpoint_activations: False
06:25:17 |     codes_attention_num_heads: 4
06:25:17 |     codes_attention_type: basic
06:25:17 |     compressed_indexer_factory: IVF4096_HNSW128,PQ128
06:25:17 |     compressed_indexer_gpu_train: False
06:25:17 |     compressed_indexer_nprobe: 64
06:25:17 |     compute_tokenized_bleu: False
06:25:17 |     converting: False
06:25:17 |     data_parallel: False
06:25:17 |     datapath: /content/ParlAI/data
06:25:17 |     datatype: train:stream
06:25:17 |     delimiter: '\n'
06:25:17 |     dict_class: parlai.core.dict:DictionaryAgent
06:25:17 |     dict_endtoken: __end__
06:25:17 |     dict_file: /content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model.dict
06:25:17 |     dict_initpath: None
06:25:17 |     dict_language: english
06:25:17 |     dict_loaded: True
06:25:17 |     dict_lower: False
06:25:17 |     dict_max_ngram_size: -1
06:25:17 |     dict_maxtokens: -1
06:25:17 |     dict_minfreq: 0
06:25:17 |     dict_nulltoken: __null__
06:25:17 |     dict_starttoken: __start__
06:25:17 |     dict_textfields: text,labels
06:25:17 |     dict_tokenizer: gpt2
06:25:17 |     dict_unktoken: __unk__
06:25:17 |     display_add_fields: 
06:25:17 |     display_examples: False
06:25:17 |     display_prettify: False
06:25:17 |     doc_chunk_split_mode: word
06:25:17 |     doc_chunks_ranker: head
06:25:17 |     download_path: None
06:25:17 |     dpr_model_file: zoo:hallucination/bart_rag_token/model
06:25:17 |     dpr_num_docs: 25
06:25:17 |     dropout: 0.1
06:25:17 |     dynamic_batching: None
06:25:17 |     embedding_projection: random
06:25:17 |     embedding_size: 1024
06:25:17 |     embedding_type: random
06:25:17 |     embeddings_scale: True
06:25:17 |     encode_candidate_vecs: True
06:25:17 |     encode_candidate_vecs_batchsize: 256
06:25:17 |     eval_candidates: inline
06:25:17 |     ffn_size: 4096
06:25:17 |     fixed_candidate_vecs: reuse
06:25:17 |     fixed_candidates_path: None
06:25:17 |     force_fp16_tokens: True
06:25:17 |     fp16: False
06:25:17 |     fp16_impl: safe
06:25:17 |     generation_model: bart
06:25:17 |     gold_document_key: __selected-docs__
06:25:17 |     gold_document_titles_key: select-docs-titles
06:25:17 |     gold_knowledge_passage_key: checked_sentence
06:25:17 |     gold_knowledge_title_key: title
06:25:17 |     gold_sentence_key: __selected-sentences__
06:25:17 |     gpu: -1
06:25:17 |     gradient_clip: 0.1
06:25:17 |     hide_labels: False
06:25:17 |     history_add_global_end_token: None
06:25:17 |     history_reversed: False
06:25:17 |     history_size: -1
06:25:17 |     hnsw_ef_construction: 200
06:25:17 |     hnsw_ef_search: 128
06:25:17 |     hnsw_indexer_store_n: 128
06:25:17 |     ignore_bad_candidates: False
06:25:17 |     image_cropsize: 224
06:25:17 |     image_mode: raw
06:25:17 |     image_size: 256
06:25:17 |     indexer_buffer_size: 65536
06:25:17 |     indexer_type: compressed
06:25:17 |     inference: beam
06:25:17 |     init_fairseq_model: None
06:25:17 |     init_model: None
06:25:17 |     init_opt: None
06:25:17 |     insert_gold_docs: True
06:25:17 |     interactive_candidates: fixed
06:25:17 |     interactive_mode: True
06:25:17 |     interactive_task: True
06:25:17 |     invsqrt_lr_decay_gamma: -1
06:25:17 |     is_debug: False
06:25:17 |     knowledge_access_method: classify
06:25:17 |     label_truncate: 128
06:25:17 |     learn_embeddings: True
06:25:17 |     learn_positional_embeddings: True
06:25:17 |     learningrate: 1e-05
06:25:17 |     local_human_candidates_file: None
06:25:17 |     log_keep_fields: all
06:25:17 |     loglevel: info
06:25:17 |     lr_scheduler: reduceonplateau
06:25:17 |     lr_scheduler_decay: 0.5
06:25:17 |     lr_scheduler_patience: 1
06:25:17 |     max_doc_token_length: 256
06:25:17 |     memory_attention: sqrt
06:25:17 |     memory_decoder_beam_min_length: 10
06:25:17 |     memory_decoder_beam_size: 3
06:25:17 |     memory_decoder_delimiter: '\n'
06:25:17 |     memory_decoder_ignore_phrase: persona:
06:25:17 |     memory_decoder_key: full_text
06:25:17 |     memory_decoder_model_file: zoo:blenderbot2/memory_decoder/model
06:25:17 |     memory_decoder_one_line_memories: False
06:25:17 |     memory_decoder_truncate: -1
06:25:17 |     memory_doc_delimiter: :
06:25:17 |     memory_doc_title_delimiter: ' / '
06:25:17 |     memory_extractor_phrase: persona:
06:25:17 |     memory_key: personas
06:25:17 |     memory_reader_model: None
06:25:17 |     memory_retriever_truncate: -1
06:25:17 |     memory_writer_model: bert
06:25:17 |     memory_writer_model_file: zoo:hallucination/multiset_dpr/hf_bert_base.cp
06:25:17 |     min_doc_token_length: 64
06:25:17 |     model: projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent
06:25:17 |     model_file: /content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model
06:25:17 |     model_parallel: True
06:25:17 |     momentum: 0
06:25:17 |     multitask_weights: '[3.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]'
06:25:17 |     n_decoder_layers: 12
06:25:17 |     n_docs: 5
06:25:17 |     n_encoder_layers: 12
06:25:17 |     n_extra_positions: 0
06:25:17 |     n_heads: 16
06:25:17 |     n_layers: 12
06:25:17 |     n_positions: 1024
06:25:17 |     n_ranked_doc_chunks: 1
06:25:17 |     n_segments: 0
06:25:17 |     nesterov: True
06:25:17 |     no_cuda: False
06:25:17 |     normalize_sent_emb: False
06:25:17 |     nus: [0.7]
06:25:17 |     optimizer: adamax
06:25:17 |     outfile: 
06:25:17 |     output_conversion_path: None
06:25:17 |     output_scaling: 1.0
06:25:17 |     override: "{'model_file': '/content/ParlAI/data/models/blenderbot2/blenderbot2_400M/model', 'search_server': '0.0.0.0:1111'}"
06:25:17 |     parlai_home: /private/home/kshuster/ParlAI
06:25:17 |     path_to_dense_embeddings: None
06:25:17 |     path_to_dpr_passages: zoo:hallucination/wiki_passages/psgs_w100.tsv
06:25:17 |     path_to_index: zoo:hallucination/wiki_index_compressed/compressed_pq
06:25:17 |     person_tokens: False
06:25:17 |     poly_attention_num_heads: 4
06:25:17 |     poly_attention_type: basic
06:25:17 |     poly_faiss_model_file: None
06:25:17 |     poly_n_codes: 64
06:25:17 |     poly_score_initial_lambda: 0.5
06:25:17 |     polyencoder_init_model: wikito
06:25:17 |     polyencoder_type: codes
06:25:17 |     print_docs: False
06:25:17 |     query_generator_beam_min_length: 2
06:25:17 |     query_generator_beam_size: 1
06:25:17 |     query_generator_delimiter: '\n'
06:25:17 |     query_generator_ignore_phrase: persona:
06:25:17 |     query_generator_inference: beam
06:25:17 |     query_generator_key: full_text
06:25:17 |     query_generator_model_file: zoo:blenderbot2/query_generator/model
06:25:17 |     query_generator_truncate: -1
06:25:17 |     query_model: bert_from_parlai_rag
06:25:17 |     rag_model_type: token
06:25:17 |     rag_query_truncate: 512
06:25:17 |     rag_retriever_query: full_history
06:25:17 |     rag_retriever_type: search_engine
06:25:17 |     rag_turn_discount_factor: 1.0
06:25:17 |     rag_turn_marginalize: doc_then_turn
06:25:17 |     rag_turn_n_turns: 2
06:25:17 |     rank_candidates: False
06:25:17 |     rank_top_k: -1
06:25:17 |     reduction_type: mean
06:25:17 |     regret: False
06:25:17 |     regret_dict_file: None
06:25:17 |     regret_intermediate_maxlen: 32
06:25:17 |     regret_model_file: None
06:25:17 |     regret_override_index: False
06:25:17 |     relu_dropout: 0.0
06:25:17 |     repeat_blocking_heuristic: True
06:25:17 |     retriever_debug_index: None
06:25:17 |     retriever_delimiter: '\n'
06:25:17 |     retriever_embedding_size: 768
06:25:17 |     retriever_ignore_phrase: persona:
06:25:17 |     return_cand_scores: False
06:25:17 |     save_format: conversations
06:25:17 |     search_query_generator_beam_min_length: 2
06:25:17 |     search_query_generator_beam_size: 1
06:25:17 |     search_query_generator_inference: greedy
06:25:17 |     search_query_generator_model_file: zoo:blenderbot2/query_generator/model
06:25:17 |     search_query_generator_text_truncate: 512
06:25:17 |     search_server: 0.0.0.0:1111
06:25:17 |     share_encoders: True
06:25:17 |     share_search_and_memory_query_encoder: False
06:25:17 |     share_word_embeddings: True
06:25:17 |     single_turn: False
06:25:17 |     skip_generation: False
06:25:17 |     skip_retrieval_token: no_passages_used
06:25:17 |     skip_search_key: skip_search
06:25:17 |     special_tok_lst: None
06:25:17 |     split_lines: True
06:25:17 |     splitted_chunk_length: 256
06:25:17 |     starttime: Jul09_14-09
06:25:17 |     t5_dropout: 0.0
06:25:17 |     t5_generation_config: None
06:25:17 |     t5_model_arch: t5-base
06:25:17 |     t5_model_parallel: False
06:25:17 |     task: None
06:25:17 |     temperature: 1.0
06:25:17 |     text_truncate: 512
06:25:17 |     tfidf_max_doc_paragraphs: -1
06:25:17 |     tfidf_model_path: zoo:wikipedia_full/tfidf_retriever/model
06:25:17 |     thorough: False
06:25:17 |     topk: 10
06:25:17 |     topp: 0.9
06:25:17 |     train_predict: False
06:25:17 |     truncate: 512
06:25:17 |     update_freq: 1
06:25:17 |     use_memories: False
06:25:17 |     use_reply: label
06:25:17 |     variant: prelayernorm
06:25:17 |     verbose: False
06:25:17 |     warmup_rate: 0.0001
06:25:17 |     warmup_updates: -1
06:25:17 |     weight_decay: None
06:25:17 |     woi_doc_chunk_size: 500
06:25:17 |     wrap_memory_encoder: False
06:25:17 | Current ParlAI commit: a81608d20633f329068b39c596783d450f6d990c
06:25:17 | Current internal commit: a81608d20633f329068b39c596783d450f6d990c
06:25:18 | Current fb commit: a81608d20633f329068b39c596783d450f6d990c
Enter [DONE] if you want to end the episode, [EXIT] to quit.```
klshuster commented 2 years ago

Hi there. I am not sure what you mean by your results "contradict everything that is written in this repo."

If you're referring to the model's memory performance, perhaps you can try turning on --knowledge-access-method memory_only to make sure that the model accesses its memory on each response; there is currently a step where the model chooses whether to access memory or search the internet, and it is not foolproof.

github-actions[bot] commented 2 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.