facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

UnicodeEncodeError while trying to use Nucleus Sampling #2986

Closed aslucc closed 4 years ago

aslucc commented 4 years ago

Bug description UnicodeEncodeError while trying to use Nucleus Sampling. Got this error after sending my first messagge. UnicodeEncodeError: 'ascii' codec can't encode character '\u8fb6' in position 173: ordinal not in range(128)

Reproduction steps

parlai interactive -mf covid90M\
      --single_turn False \
       --inference nucleus --topp 20

Expected behavior Being able to interact with the model .

Logs Please paste the command line output:

Traceback (most recent call last):
  File "/usr/local/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/parlai/core/script.py", line 272, in superscript_main
    SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/parlai/core/script.py", line 88, in _run_from_parser_and_opt
    return script.run()
  File "/parlai/scripts/interactive.py", line 118, in run
    return interactive(self.opt)
  File "/parlai/scripts/interactive.py", line 93, in interactive
    world.parley()
  File "/parlai/tasks/interactive/worlds.py", line 79, in parley
    agents[0].observe(validate(acts[1]))
  File "/parlai/agents/local_human/local_human.py", line 59, in observe
    prettify=self.opt.get('display_prettify', False),
UnicodeEncodeError: 'ascii' codec can't encode character '\u8fb6' in position 173: ordinal not in range(128)
stephenroller commented 4 years ago

Really unsure about this one. Does nucleus work with the model we distributed?

aslucc commented 4 years ago

I get the same error using the following command (with the model in your zoo): parlai interactive -t blended_skill_talk -mf zoo:blender/blender_90M/model --inference nucleus --topp 20

stephenroller commented 4 years ago

Is that the full traceback? It seems like a very weird place for such an error.

stephenroller commented 4 years ago

(likely if you export LANG=en_US.UTF-8 it will work)

aslucc commented 4 years ago

I have:

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
PYTHONUTF8=1

but I'm still getting the same error.

The full traceback of the command on the model in the zoo (I just noticed it is different):

Traceback (most recent call last):
  File "/usr/local/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/parlai/core/script.py", line 272, in superscript_main
    SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/parlai/core/script.py", line 88, in _run_from_parser_and_opt
    return script.run()
  File "/parlai/scripts/interactive.py", line 118, in run
    return interactive(self.opt)
  File "/parlai/scripts/interactive.py", line 89, in interactive
    world = create_task(opt, [human_agent, agent])
  File "/parlai/core/worlds.py", line 1633, in create_task
    world = create_task_world(opt, user_agents, default_world=default_world)
  File "/parlai/core/worlds.py", line 1605, in create_task_world
    return world_class(opt, task_agents + user_agents)
  File "/parlai/tasks/blended_skill_talk/worlds.py", line 131, in __init__
    super().__init__(opt, agents, shared)
  File "/parlai/tasks/interactive/worlds.py", line 24, in __init__
    self.init_contexts(shared=shared)
  File "/parlai/tasks/blended_skill_talk/worlds.py", line 135, in init_contexts
    self.contexts_data = get_contexts_data(self.opt, shared=shared)
  File "/parlai/tasks/blended_skill_talk/worlds.py", line 18, in get_contexts_data
    return _load_personas(opt=opt)
  File "/parlai/tasks/blended_skill_talk/worlds.py", line 37, in _load_personas
    raw_safe_persona_groups = [line.strip() for line in f.readlines()]
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2235: ordinal not in range(128)
stephenroller commented 4 years ago

Awesome, thanks.

aslucc commented 4 years ago

Updates: I keep getting this kind of errors for anything I do, for example now I'm trying to use chat_services (command python3 run.py --config-path ../../tasks/chatbot/config.yml) and I get the following output:

You are going to allow people on Facebook to be agents in ParlAI.
During this process, Internet connection is required, and you should turn off your computer's auto-sleep feature.

Please press Enter to continue...

Setting up Messenger webhook...
Heroku: Collecting files...
Heroku: Starting server...
Traceback (most recent call last):
  File "/parlai/chat_service/utils/server.py", line 125, in setup_heroku_server
    stderr=subprocess.STDOUT,
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/parlai/chat_service/core/heroku-cli-v6.99.0-ec9edad-linux-x64/bin/heroku', 'create', 'root-parlai-messenger-chatbot']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 47, in <module>
    run(opt)
  File "run.py", line 32, in run
    manager = MessengerManager(opt)
  File "/parlai/chat_service/services/messenger/messenger_manager.py", line 42, in __init__
    self._complete_setup()
  File "/parlai/chat_service/services/messenger/messenger_manager.py", line 62, in _complete_setup
    self.setup_server()
  File "/parlai/chat_service/services/messenger/messenger_manager.py", line 210, in setup_server
    self.server_task_name, local=self.opt['local']
  File "/parlai/chat_service/utils/server.py", line 284, in setup_server
    return setup_heroku_server(task_name)
  File "/parlai/chat_service/utils/server.py", line 168, in setup_heroku_server
    print(error_text)
UnicodeEncodeError: 'ascii' codec can't encode character '\u25b8' in position 45: ordinal not in range(128)

On the internet I found out that the main reason for this kind of errors is using the str function intead of encode(). I'm not sure if the issue is on my machine and I should dive deeper into it or it's a "bug" present all over your code and I can't do much about it.

stephenroller commented 4 years ago

Can you try making it “python -X utf8”?

aslucc commented 4 years ago

Thanks for the suggestion. Tried python3 -X utf8 parlai interactive -t blended_skill_talk -mf zoo:blender/blender_90M/model --inference nucleus --topp 20 and python3 -X utf8 parlai/scripts/interactive.py -t blended_skill_talk -mf zoo:blender/blender_90M/model --inference nucleus --topp 20 but still getting the same error, I'm not sure if I'm using the command you suggested in the correct way. Also tried PYTHONIOENCODING=UTF-8 python3 parlai interactive -t blended_skill_talk -mf zoo:blender/blender_90M/model --inference nucleus --topp 20 as I read online but still no luck.

stephenroller commented 4 years ago

This is maddening

aslucc commented 4 years ago

Solved this with:

echo "LC_ALL=en_US.UTF-8" >> /etc/environment
echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
echo "LANG=en_US.UTF-8" > /etc/locale.conf
locale-gen en_US.UTF-8

(found here https://github.com/tianon/docker-brew-debian/issues/45#issuecomment-325235517)

Thanks a lot for the many attempts to help me with this, I'm closing the issue.

stephenroller commented 4 years ago

That’s wild. I don’t for the life of me understand why the sampling would error out either, it’s not on file load or anything.

aslucc commented 4 years ago

Hello, I gotta reopen this as I runned into the error again, even if I did the changes I mentioned above. While I was chatting interactively with my model (command python3 parlai/scripts/interactive.py -mf covid90M), I got an error with a specific question: Will the number of COVID-19 cases decline in the summer? Same result with: Will the number of COVID-19 cases decrease in the summer? or Will the number of covid cases decrease in the summer? or Will the number of covid cases decrease during the summer? But it worked with: There will be less covid cases during the summer? and Hello, will the number of COVID-19 cases decrease in the summer? so I guessed there was a problem with starting the sentence with W, but Why are you chatting? is working nicely!

Output Log:

Traceback (most recent call last):
  File "parlai/scripts/interactive.py", line 122, in <module>
    Interactive.main()
  File "/parlai/core/script.py", line 109, in main
    return cls._run_args(None)
  File "/parlai/core/script.py", line 82, in _run_args
    return cls._run_from_parser_and_opt(opt, parser)
  File "/parlai/core/script.py", line 88, in _run_from_parser_and_opt
    return script.run()
  File "parlai/scripts/interactive.py", line 117, in run
    return interactive(self.opt)
  File "parlai/scripts/interactive.py", line 92, in interactive
    world.parley()
  File "/parlai/tasks/interactive/worlds.py", line 79, in parley
    agents[0].observe(validate(acts[1]))
  File "/parlai/agents/local_human/local_human.py", line 58, in observe
    ignore_fields=self.opt.get('display_ignore_fields', ''),
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 135: ordinal not in range(128)
stephenroller commented 4 years ago

It's happening when outputs are printed out?

aslucc commented 4 years ago

I write the question after Enter Your Message: and press enter, the process after about 10 seconds interrupts the execution and prints the output log (without printing any [TransformerGenerator]: answer). I noticed something else, if i ask that question as the second message instead of the first, it works. Here is an example: https://imgur.com/3l8woFV

stephenroller commented 4 years ago

It reads as if your terminal can't display Unicode 🙁

github-actions[bot] commented 4 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.