facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

How run BB3 in non-interactive mode #4967

Closed SarikGhazarian closed 1 year ago

SarikGhazarian commented 1 year ago

I am trying to run BB3 in non-interactive mode such that I can give it a file of dialogue contexts and ask BB3 to generate response for them respectively. Consider, here is a dialogue context:

Person 1: Hi, this is Jessica. What is your name? Person 2: Hi, I am Maggie. Do you want to talk about world cup games? I love world cup. Person 1: Oh, yes sure. Person 2:

Currently, I have the following code which tries to continue the conversation by printing response of Person 2:

from parlai.core.agents import create_agent
from parlai.core.message import Message
opt_dict = {'model_file': 'ParlAI/data/models/bb3/bb3_3B/model', 'interactive_mode': True}
agent = create_agent(opt_dict) 
observations=[]
agent_copies = []
agent_copies.append(agent.clone())
act_0 = Message({'id': 'context_0', 'text': "Person 1: Hi, this is Jessica. What is your name? Person 2: Hi, I am Maggie. Do you want to talk about world cup games? I love world cup. Person 1: Oh, yes sure. **Person 2:** ", 'episode_done': False})
  observations.append(agent_copies[0].observe(act_0)) 
response = agent.batch_act(observations)
print(response)

which prints a response like this:

[{'id': 'ComboFidGoldDocument', 'episode_done': False, 'text': 'Person 1: Hi, this is Jessica. What is your name?', 'beam_texts': [('Person 1: Hi, this is Jessica. What is your name?', -1.5164902210235596)], 'top_docs': [ID: Title: Text: , ID: Title: Text: , ID: Title: Text: , ID: Title: Text: , ID: Title: Text: ], 'metrics': {'clen': AverageMetric(53), 'ctrunc': AverageMetric(0), 'ctrunclen': AverageMetric(0), 'gen_n_toks': AverageMetric(16)}}]

that does not make sense. I look forward for some suggestions!

thanks!

klshuster commented 1 year ago

Hi, in your opt_dict, you'll also want to specify 'init_opt' : 'gen/r2c2_bb3.opt'; that should hopefully help out

SarikGhazarian commented 1 year ago

Hi Kurst,

thanks for your reply and hint! The main issue was regarding the format of dialogue context. I replaced Person1: and Person2: with \n to show the dialogue context and now it generates some outputs accordingly.

from parlai.core.agents import create_agent
from parlai.core.message import Message
opt_dict = {'model_file': 'ParlAI/data/models/bb3/bb3_3B/model', 'interactive_mode': True, 'init_opt' : 'gen/r2c2_bb3.opt', 'search-server' : '0.0.0.0:8080'}
agent = create_agent(opt_dict) 
observations=[]
agent_copies = []
agent_copies.append(agent.clone())
act_0 = Message({'id': 'context_0', 'text': "Hi, this is Jessica. What is your name?\nHi, I am Maggie. Do you want to talk about world cup games? I love world cup.\nOh, yes sure. Who was the winner of latest world cup?\n", 'episode_done': False})
observations.append(agent_copies[0].observe(act_0)) 
response = agent.batch_act(observations)
print(response)

Output of non-interactive mode: [{'id': 'ComboFidGoldDocument', 'episode_done': False, 'text': "I am not sure. I don't watch soccer.", 'beam_texts': [("I am not sure. I don't watch soccer.", -6.9718523025512695)], 'top_docs': [ID: Title: Text: , ID: Title: Text: , ID: Title: Text: , ID: Title: Text: , ID: Title: Text: ], 'metrics': {'clen': AverageMetric(50), 'ctrunc': AverageMetric(0), 'ctrunclen': AverageMetric(0), 'gen_n_toks': AverageMetric(13)}}] The model does not do a search on search server (0.0.0.0:8080) which is open and listening.

However, if I give the the same query to the model in the interactive mode, I can see that the response is different and indeed model does search and access to memory:

parlai safe_interactive --model-file zoo:bb3/bb3_3B/model --init-opt gen/r2c2_bb3 --search-server 0.0.0.0:8080

Output of interactive mode: [BlenderBot3]: Portugal won the world cup in 2018. How about you? Do you have a favorite team?

I would appreciate to get some guidance here.

mojtaba-komeili commented 1 year ago

Looks like for some reason your model doesn't get loaded in the interactive mode, with generation being on. Could you try it with create_agent_from_model_file instead of create_agent? It would be something like these changes in your code:

...
from parlai.core.agents import create_agent_from_model_file
...
opt_dict = {'init_opt' : 'gen/r2c2_bb3.opt', 'interactive_mode': True}
agent = create_agent_from_model_file('zoo:bb3/bb3_3B/model', opt_dict)
...
klshuster commented 1 year ago

Are you sure you're giving the exact same context for both interactive and non-interactive? Could you share a screenshot of your interactive output?

SarikGhazarian commented 1 year ago

@mojtaba-komeili thank you for your suggestion, I changed the code but still the non-interactive output was as it was before. output from non-interactive code: Screen Shot 2023-04-06 at 1 19 50 PM'

@klshuster yes, the contexts are the same. Here I have attached the screenshot of interactive output (only include the relevant sections as it is too long):

Screen Shot 2023-04-06 at 1 25 41 PM Screen Shot 2023-04-06 at 1 27 21 PM

As you compare the outputs of these two, only in interactive mode we see "Loading search generator model".

klshuster commented 1 year ago

Could you try taking the opt values specified here (gen/r2c2_bb3) and putting them in your opt_dict when you're doing non-interactive?

SarikGhazarian commented 1 year ago

Hi @klshuster, thanks for the suggestion it helped to run BlenderBot3Agent in non-interactive mode, previously it was ComboFidGoldDocumentAgent.

For those who are interested , here is my updated code:

from parlai.core.agents import create_agent_from_model_file
from parlai.core.message import Message
opt_dict = {'init_opt' : 'gen/r2c2_bb3.opt', 'interactive_mode': True, "sdm_beam_block_ngram": -1,  "sdm_beam_min_length": 1,  "sdm_beam_size": 1,  "sdm_history_size": 1,  "sdm_inference": "greedy",  "search_decision": "compute",  "search_decision_control_token": "__is-search-required__",  "search_decision_do_search_reply": "__do-search__",  "search_decision_dont_search_reply": "__do-not-search__",  "mdm_beam_block_ngram": -1,  "mdm_beam_min_length": 1,  "mdm_beam_size": 1,  "mdm_history_size": -1,  "mdm_inference": "greedy",  "mdm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "memory_decision": "compute",  "memory_decision_control_token": "__is-memory-required__",  "memory_decision_do_access_reply": "__do-access-memory__",  "memory_decision_dont_access_reply": "__do-not-access-memory__",  "memory_decision_use_memories": True,  "search_query_control_token": "__generate-query__",  "search_server": "0.0.0.0:8080",  "sgm_beam_block_ngram": -1,  "sgm_beam_min_length": 2,  "sgm_beam_size": 1,  "sgm_inference": "beam",  "sgm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "memory_generator_control_token": "__generate-memory__",  "mgm_beam_block_ngram": 3,  "mgm_beam_min_length": 10,  "mgm_beam_size": 3,  "mgm_inference": "beam",  "mgm_history_size": 1,  "mgm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "memory_knowledge_control_token": "__access-memory__",  "mkm_beam_block_ngram": 3,  "mkm_beam_context_block_ngram": -1,  "mkm_beam_min_length": 5,  "mkm_beam_size": 3,  "mkm_inference": "beam",  "mkm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "mkm_rag_retriever_type": "search_engine",  "mkm_search_query_generator_model_file": "''",  "mkm_search_server": "",  "mkm_memory_retriever": True,  "contextual_knowledge_control_token": "__extract-entity__",  "ckm_beam_block_ngram": 3,  "ckm_beam_context_block_ngram": 3,  "ckm_beam_min_length": 1,  "ckm_beam_size": 3,  "ckm_inference": "beam",  "ckm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "search_knowledge_control_token": "__generate-knowledge__",  "skm_beam_block_ngram": 3,  "skm_beam_context_block_ngram": 3,  "skm_beam_min_length": 10,  "skm_beam_size": 3,  "skm_doc_chunks_ranker": "woi_chunk_retrieved_docs",  "skm_inference": "beam",  "skm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "skm_n_ranked_doc_chunks": 1,  "skm_rag_retriever_type": "search_engine",  "skm_search_query_generator_model_file": "''",  "srm_beam_block_full_context": True,  "srm_beam_block_ngram": 3,  "srm_beam_context_block_ngram": 3,  "srm_beam_min_length": 20,  "srm_beam_size": 10,  "srm_inference": "beam",  "srm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "crm_beam_block_full_context": True,  "crm_beam_block_ngram": 3,  "crm_beam_context_block_ngram": 3,  "crm_beam_min_length": 20,  "crm_beam_size": 10,  "crm_inference": "beam",  "crm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "mrm_beam_block_full_context": True,  "mrm_beam_block_ngram": 3,  "mrm_beam_context_block_ngram": 3,  "mrm_beam_min_length": 20,  "mrm_beam_size": 10,  "mrm_inference": "beam",  "mrm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "grm_beam_block_full_context": True,  "grm_beam_block_ngram": 3,  "grm_beam_context_block_ngram": 3,  "grm_beam_min_length": 20,  "grm_beam_size": 10,  "grm_inference": "beam",  "grm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "vrm_beam_block_full_context": True,  "vrm_beam_block_ngram": 3,  "vrm_beam_context_block_ngram": 3,  "vrm_beam_min_length": 20,  "vrm_beam_size": 10,  "vrm_inference": "beam",  "vrm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "orm_beam_block_full_context": True,  "orm_beam_block_ngram": 3,  "orm_beam_context_block_ngram": 3,  "orm_beam_min_length": 20,  "orm_beam_size": 10,  "orm_inference": "beam",  "orm_model": "projects.bb3.agents.r2c2_bb3_agent:BB3SubSearchAgent",  "datatype": "valid",  "beam_disregard_knowledge_for_srm_context_blocking": False,  "beam_disregard_knowledge_for_mrm_context_blocking": False,  "beam_disregard_knowledge_for_crm_context_blocking": False,  "beam_disregard_knowledge_for_grm_context_blocking": False,  "beam_disregard_knowledge_for_vrm_context_blocking": False,  "beam_disregard_knowledge_for_orm_context_blocking": False,  "exclude_context_in_skm_context_blocking": False,  "exclude_context_in_mkm_context_blocking": False,  "exclude_context_in_ckm_context_blocking": False,  "include_knowledge_in_skm_context_blocking": True,  "include_knowledge_in_mkm_context_blocking": True,  "include_knowledge_in_ckm_context_blocking": False,  "inject_query_string": "",  "loglevel": "debug",  "model": "projects.bb3.agents.r2c2_bb3_agent:BlenderBot3Agent",  "knowledge_conditioning": "combined",  "contextual_knowledge_decision": "compute", "serializable_output": True}
agent = create_agent_from_model_file('zoo:bb3/bb3_3B/model', opt_dict)
observations=[]
agent_copies = []
agent_copies.append(agent.clone())
act_0 = Message({'id': 'context_0', 'text': "Hi, this is Jessica. What is your name?\nHi, I am Maggie. Do you want to talk about world cup games? I love world cup.\nOh, yes sure. Who was the winner of latest world cup?\n", 'episode_done': False})
observations.append(agent_copies[0].observe(act_0)) 
response = agent.batch_act(observations)
print(response)

Now that interactive and non-interactive mode give the same output, the outputs are incorrect:

Query: Do you know who is the current president of USA?\n bb3: George W. Bush was the president of the United States from 2001 to 2017. Do you know anything about him?

Query: Hi, this is Jessica. What is your name?\nHi, I am Maggie. Do you want to talk about world cup games? I love world cup.\nOh, yes sure. Who was the winner of latest world cup?\n bb3: 'Portugal won the world cup in 2010. How about you? Do you have a favorite team?

My question is that are these because of the model's performance or am I missing something here (like search engine setup)?

thanks!

mojtaba-komeili commented 1 year ago

Glad to see things worked out. The factual correctness of the model, or its grasp of reality and common sense (specially about the recency of events) is an open challenge for AI. SKR and BB3 took steps towards improving it, but still not there yet.

github-actions[bot] commented 1 year ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.