Azure-Samples / Machine-Learning-Operationalization

Deploying machine learning models to Azure
MIT License
62 stars 67 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 14: invalid continuation byte #46

Open oridayne opened 6 years ago

oridayne commented 6 years ago

I have a model that classifies text, and it is important that it can handle foreign characters, i.e any unicode character.

test = "ê" payload ="{\"input_df\": \"" + test + "\"}" print(payload) r = requests.post(url, data=payload, headers=header) print(r.text)

{"input_df": "ê"} An unexpected internal error occurred. Encountered Exception: Traceback (most recent call last): File "/home/mmlspark/lib/conda/lib/python3.5/site-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/home/mmlspark/lib/conda/lib/python3.5/site-packages/flask/app.py", line 1625, in dispatch_request return self.view_functionsrule.endpoint File "/var/azureml-app/app.py", line 77, in score_realtime service_input = request.data.decode("utf-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 14: invalid continuation byte

I then tried a lot of things, like test = "ê" payload ={"input_df":test} payload = json.dumps(payload) print(payload) r = requests.post(url, data=payload, headers=header) print(r.text)

{"input_df": "\u00ea"} 'ascii' codec can't encode character '\xea' in position 0: ordinal not in range(128)

Basically, I tried messing with a lot of things, but it just seems like I can't get it to handle unicode characters.

I'm unsure at this point if it's something wrong with swagger, or with azure ml. I also tried to look into changing the code that specifies the input type:
inputs = {"input_df": SampleDefinition(DataTypes.STANDARD, sample_input)} However, I did not find documentation on what possible types there were. In the end I just need to pass in a name of a person, or a string, and have it spit out my classification.

Any tips would be appreciated.