Closed rabbah closed 7 years ago
Can we just normalize to UTF-8 instead of ISO8859-1. Until this is fixed it will be difficult for anyone to use OpenWhisk in anything other than English.
Using Unicode escape sequence in action code also could produce UTF-8 output:
var chineseMesage = "\u88c5\u7f6e\u88ab\u8fde\u63a5";
@psuter can you summarize your findings for future investigators?
I found a S/O discussion regarding the encoding declaration for HttpEntity in scala/spray
Hope this solution would help.
So this is slightly fuzzy in my head.. I believe I had tracked it down to being an issue only happening when the JVMs are running inside a Docker container. (The DB seems to store everything properly.) I had diff'ed the JVM system properties when running outside of a container and when running with the image we use for all Scala containers, and there was only one difference. I believe it was a property related to the byte order mark... not 100% sure. I also believe this was a fixed property, not one that the user can configure.
Also, my branch https://github.com/psuter/openwhisk/tree/utf-8-db had some tests and (unsuccessful) attempts.
I confirmed the DB (CouchDB) stores action codes with non-English data correctly by reviewing the CouchDB web interface for my vagrant install.
http://172.17.0.1:5984/_utils/database.html?vagrant_vagrant-ubuntu-trusty-64_whisks (Overview > vagrant_vagrant-ubuntu-trusty-64_whisks > guest/myaction)
This may mean wrong charset conversion happens when loading action codes or producing a response in the Invoker.
I suspect that OpenWhisk is using the wrong codepage when actions are being executed. My guess is that they are using ISO8859-1 (Latin 1) when they should be using UTF-8 (Unicode). This should be a simple configuration change and would enable them to avoid having to do a conversion as the actions and data are most likely in UTF-8 already.
Getting more reports from users being hit with this problem, like using a parameter with value Aurélien
the é
not being handle correctly.
I think that this really needs to get fixed and I think it is tied to #362 as well.
There are many instances of this bug - fixing this will close the most issues in one commit :1st_place_medal:
Here's a hint: actions via zip files work. This rules out (as both @psuter and I confirmed with separate unit tests) that it's not the action container but likely the interface between the container and the invoker for exchanging arguments.
Ok so this means that it is probably the code that is reading the files. It may be reading it in the wrong codepage. Check to see if they have hardcoded that. If the code that reads the files is written in Java then the default codepage is probably ISO8859-1. I don't believe that UTF-8 is the default you need to explicit state the encoding.
found it!
Don't keep it a secret !!!
Let me get some 🍿 , brb
Awesome :)
I confirmed a couple of issues were duplicates and closed them (and tests cover them). The Java action proxy, and the python proxy for the other runtimes will need to be patched. These are confirmed to be in the containers themselves, and not the Invoker (as was the case with node.js). #1757 fixes the node.js actions as a first step.
EDIT: Java actions also fixed in same PR.
Great!
Thanks!
Fix merged in #1757.
The decoding is localized to reading records from the database (on the client side) which is normalizing to the HTTP default charset ISO-8859-1 instead of UTF-8.
As a sanity check, I confirmed that for a sample code:
The record is stored correctly in the database. If I were to encode the string using base64 encoding and logging the result, the activation result and logs are correctly stored in the database:
results in this activation record:
@psuter FYI this is true both using the Cloudant SDK as well as the Spray client.