calpoly-csai / api

Official API for the NIMBUS Voice Assistant accessible via HTTP REST protocol.
https://nimbus.api.calpolycsai.com/
GNU General Public License v3.0
9 stars 4 forks source link

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 97: ordinal not in range(128) #111

Closed mfekadu closed 4 years ago

mfekadu commented 4 years ago

Describe the bug

Nimbus fails to respond when the following character is inputted

’ a.k.a U+2019

which differs from the following character

' a.k.a U+0027

To Reproduce

Steps to reproduce the behavior:

  1. Go to https://nimbus.calpolycsai.com
  2. Click on the input box
  3. Type What is Foaad's email?
  4. Notice the correct answer
  5. Type What is Foaad’s email?
  6. Notice the lack of response

Expected behavior

Nimbus should accept all Unicode characters and efficiently normalize inputs that include non-ASCII characters.

Screenshots

Mar-06-2020 17-20-17 2020-03-06 17_33_21

Desktop:

Full Stack Trace

heroku logs

``` 2020-03-07T01:18:29.759600+00:00 app[web.1]: [nltk_data] Downloading package stopwords to /nimbus/nltk_data... 2020-03-07T01:18:29.759608+00:00 app[web.1]: [nltk_data] Package stopwords is already up-to-date! 2020-03-07T01:18:29.759609+00:00 app[web.1]: [nltk_data] Downloading package punkt to /nimbus/nltk_data... 2020-03-07T01:18:29.759609+00:00 app[web.1]: [nltk_data] Package punkt is already up-to-date! 2020-03-07T01:18:29.759609+00:00 app[web.1]: [nltk_data] Downloading package averaged_perceptron_tagger to 2020-03-07T01:18:29.759610+00:00 app[web.1]: [nltk_data] /nimbus/nltk_data... 2020-03-07T01:18:29.759611+00:00 app[web.1]: [nltk_data] Package averaged_perceptron_tagger is already up-to- 2020-03-07T01:18:29.759611+00:00 app[web.1]: [nltk_data] date! 2020-03-07T01:18:29.759614+00:00 app[web.1]: /usr/local/lib/python3.6/dist-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator KNeighborsClassifier from version 0.21.3 when using version 0.20.2. This might lead to breaking code or invalid results. Use at your own risk. 2020-03-07T01:18:29.759615+00:00 app[web.1]: UserWarning) 2020-03-07T01:18:29.759615+00:00 app[web.1]: [nltk_data] Downloading package stopwords to /nimbus/nltk_data... 2020-03-07T01:18:29.759616+00:00 app[web.1]: [nltk_data] Package stopwords is already up-to-date! 2020-03-07T01:18:29.759616+00:00 app[web.1]: [nltk_data] Downloading package punkt to /nimbus/nltk_data... 2020-03-07T01:18:29.759616+00:00 app[web.1]: [nltk_data] Package punkt is already up-to-date! 2020-03-07T01:18:29.759617+00:00 app[web.1]: [nltk_data] Downloading package averaged_perceptron_tagger to 2020-03-07T01:18:29.759617+00:00 app[web.1]: [nltk_data] /nimbus/nltk_data... 2020-03-07T01:18:29.759617+00:00 app[web.1]: [nltk_data] Package averaged_perceptron_tagger is already up-to- 2020-03-07T01:18:29.759618+00:00 app[web.1]: [nltk_data] date! 2020-03-07T01:18:29.759618+00:00 app[web.1]: /usr/local/lib/python3.6/dist-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator KNeighborsClassifier from version 0.21.3 when using version 0.20.2. This might lead to breaking code or invalid results. Use at your own risk. 2020-03-07T01:18:29.759619+00:00 app[web.1]: UserWarning) 2020-03-07T01:18:29.759619+00:00 app[web.1]: [nltk_data] Downloading package stopwords to /nimbus/nltk_data... 2020-03-07T01:18:29.759619+00:00 app[web.1]: [nltk_data] Package stopwords is already up-to-date! 2020-03-07T01:18:29.759620+00:00 app[web.1]: [nltk_data] Downloading package punkt to /nimbus/nltk_data... 2020-03-07T01:18:29.759620+00:00 app[web.1]: [nltk_data] Package punkt is already up-to-date! 2020-03-07T01:18:29.759621+00:00 app[web.1]: [nltk_data] Downloading package averaged_perceptron_tagger to 2020-03-07T01:18:29.759621+00:00 app[web.1]: [nltk_data] /nimbus/nltk_data... 2020-03-07T01:18:29.759621+00:00 app[web.1]: [nltk_data] Package averaged_perceptron_tagger is already up-to- 2020-03-07T01:18:29.759622+00:00 app[web.1]: [nltk_data] date! 2020-03-07T01:18:29.759622+00:00 app[web.1]: /usr/local/lib/python3.6/dist-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator KNeighborsClassifier from version 0.21.3 when using version 0.20.2. This might lead to breaking code or invalid results. Use at your own risk. 2020-03-07T01:18:29.759623+00:00 app[web.1]: UserWarning) 2020-03-07T01:18:29.759623+00:00 app[web.1]: [2020-03-07 01:18:29,759] ERROR in app: Exception on /ask [POST] 2020-03-07T01:18:29.759624+00:00 app[web.1]: Traceback (most recent call last): 2020-03-07T01:18:29.759625+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app 2020-03-07T01:18:29.759625+00:00 app[web.1]: response = self.full_dispatch_request() 2020-03-07T01:18:29.759626+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request 2020-03-07T01:18:29.759626+00:00 app[web.1]: rv = self.handle_user_exception(e) 2020-03-07T01:18:29.759627+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask_cors/extension.py", line 161, in wrapped_function 2020-03-07T01:18:29.759627+00:00 app[web.1]: return cors_after_request(app.make_response(f(*args, **kwargs))) 2020-03-07T01:18:29.759628+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception 2020-03-07T01:18:29.759628+00:00 app[web.1]: reraise(exc_type, exc_value, tb) 2020-03-07T01:18:29.759628+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise 2020-03-07T01:18:29.759629+00:00 app[web.1]: raise value 2020-03-07T01:18:29.759629+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request 2020-03-07T01:18:29.759630+00:00 app[web.1]: rv = self.dispatch_request() 2020-03-07T01:18:29.759630+00:00 app[web.1]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request 2020-03-07T01:18:29.759631+00:00 app[web.1]: return self.view_functions[rule.endpoint](**req.view_args) 2020-03-07T01:18:29.759631+00:00 app[web.1]: File "/nimbus/flask_api.py", line 67, in handle_question 2020-03-07T01:18:29.759631+00:00 app[web.1]: "answer": nimbus.answer_question(question) 2020-03-07T01:18:29.759632+00:00 app[web.1]: File "/nimbus/nimbus.py", line 22, in answer_question 2020-03-07T01:18:29.759632+00:00 app[web.1]: print(ans_dict) 2020-03-07T01:18:29.759640+00:00 app[web.1]: UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 97: ordinal not in range(128) ```

mfekadu commented 4 years ago

heroku failure is due to python 3.6.9 on linux

$ heroku run --app calpoly-csai-nimbus "python"

Running python on ⬢ calpoly-csai-nimbus... up, run.9357 (Standard-1X)
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u2019")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 0: ordinal not in range(128)

python 3.6.8 on macOS is fine

$ python3.6

Python 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u2019")
’
>>>

python 3.8.1 on macOS is fine

$ python3.8

Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u2019")
’

workaround for linux

https://stackoverflow.com/a/3597849

>>> utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
>>> print("\u2019".encode("utf-8"), file=utf8stdout)
b'\xe2\x80\x99'
>>> print("\u2019", file=utf8stdout)
’

Thanks for debugging help @cameron-toy !

mfekadu commented 4 years ago

possible solutions

  1. Use the Logger that csai-scraping uses. Suggested by @cameron-toy
  2. remove the print
  3. upgrade ubuntu's python
  4. downgrade ubuntu's python??
snekiam commented 4 years ago

I'm gonna close this since I just tried unicode characters and it seemed to work.