RasaHQ / rasa-x-demo

Demo app for running a bot with Rasa Enterprise
70 stars 92 forks source link

Encoding error #16

Open ErikTromp opened 5 years ago

ErikTromp commented 5 years ago

Not sure where to put this as it is a Rasa-X error and not this demo per se, but I get this when I use a domain.yml file with UTF8 encoding on Windows and some special characters (like é):

Traceback (most recent call last):
  File "c:\users\erik\anaconda3\lib\site-packages\rasa\cli\x.py", line 322, in run_locally
    local.main(args, project_path, args.data, token=rasa_x_token)
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\local.py", line 190, in main
    project_path, data_path, session, args.port
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\local.py", line 139, in _initialize_with_local_data
    domain_path, domain_service, COMMUNITY_PROJECT_NAME, COMMUNITY_USERNAME
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\initialise.py", line 136, in inject_domain
    domain_yaml=read_file(domain_path),
  File "c:\users\erik\anaconda3\lib\site-packages\rasa\utils\io.py", line 125, in read_file
    return f.read()
  File "c:\users\erik\anaconda3\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 524: invalid continuation byte

What seems to happen is:

kristiankolthoff commented 5 years ago

I am facing the same issue when the domain.yml contains some german umlauts

ErikTromp commented 5 years ago

We ended up just not using Rasa X

Mohendran commented 5 years ago

Im also Facing the same issue.

daniel-eder commented 5 years ago

I just ran into the same issue - did anyone find a possible solution or workaround so far?

EDIT: The underlying issue is that python by default writes to files with the system code page, unless an override is provided, and rasa does not specificy UTF8. Additionally, when loading the domain.yml file rasa first reformats and saves it, before actually loading and parsing it, during the first step we lose the encoding, and when loading we are no longer UTF8 causing the error.

Workaround: (Python 3.7+ only) set the environment variable PYTHONUTF8 to 1 before running rasa, this forces python to use utf8 as default encoding. On Windows: set PYTHONUTF8=1

ziligy commented 4 years ago

Solved? I ran into a similar issue and realized that there was dot-file-debris left by my mac when ssh-ing into my rasa data-directory. I deleted these hidden files to resolve the issue.

Main point: Be sure there are no hidden files in the rasa data directory!