Multi-lingual/Unicode/I18n Support

ifsheldon commented 9 months ago

Is your feature request related to a problem? Please describe.

Multi-lingual/Unicode/I18n support will be great!

I was trying to use MemGPT in Chinese but I found that even GPT4 cannot understand Chinese, which is weird based on my experience.

Describe the solution you'd like

Full multi-lingual/Unicode/I18n support can be a bit complicated, but I think we can implement this step by step: (from easy to hard)

Clean code that's not compatible with unicode
- for example, json.dumps() needs to turn off ensure_ascii. This should enable LLMs like GPT4 that is capable enough to converse in multiple languages. #800
- I don't know if there is any other code that is incompatible with unicode.
i18n:
- Translated system prompts
- Translated constant strings in code
- Translated human description
- Interface i18n like GUI i18n
- etc.

jmtrappier commented 8 months ago

Hello,

I have almost the same experience with french, using local LLM (tested llama.cpp, ollama and vLLM so far, same issue).

As long as I speak in english with the bot, no issue, runs smoothly....but when I switch to french, I get lot of JSON parsing errors.

Unicode might not be the issue, it is the JSON parsing that seems to be...

Could be a LLM server issue as well...

ifsheldon commented 8 months ago

@jmtrappier you can try the latest from source code. The published version is too old. Probably they are preparing a major breaking release.

xlbljz commented 3 months ago

I'm using 0.3.17. It seems that there are still some problems with multi languages json parsing.


> Enter your message: So if I said 你好 to you, what should you say.
[A[K
An exception occurred when running agent.step():
Traceback (most recent call last):
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\data_types.py", line 425, in to_google_ai_dict
    function_args = json.loads(function_args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 55 (char 54)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\main.py", line 408, in run_agent_loop
    new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\main.py", line 377, in process_agent_step
    new_messages, heartbeat_request, function_failed, token_warning, tokens_accumulated = memgpt_agent.step(
                                                                                          ^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\agent.py", line 818, in step
    raise e
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\agent.py", line 746, in step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\agent.py", line 451, in _get_ai_reply
    raise e
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\agent.py", line 426, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\llm_api\llm_api_tools.py", line 133, in wrapper
    raise e
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\llm_api\llm_api_tools.py", line 106, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\llm_api\llm_api_tools.py", line 268, in create
    contents=[m.to_google_ai_dict() for m in messages],
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\llm_api\llm_api_tools.py", line 268, in <listcomp>
    contents=[m.to_google_ai_dict() for m in messages],
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\idear\miniconda3\envs\dev\Lib\site-packages\memgpt\data_types.py", line 427, in to_google_ai_dict
    raise UserWarning(f"Failed to parse JSON function args: {function_args}")
UserWarning: Failed to parse JSON function args: {"message": "\u4f60\u597d means Hello in Chinese? That\'s so cool! Thank you for teaching me."}```

cpacker commented 3 months ago

I'm using 0.3.17. It seems that there are still some problems with multi languages json parsing.

Hi @xlbljz, thanks for the bug report! Based on your input, I realized that the new Gemini (and Anthropic) adapters didn't have the correct ensure_ascii on the json.dumps calls. I just went back and added those in this PR, which should hopefully fix your bug. Please let me know if it's still persisting on the nightly of the new release when we tag it!

cpacker / MemGPT

Multi-lingual/Unicode/I18n Support #799