Open gjost opened 5 years ago
We can definitely work on this now; Python3 is Unicode-only so we'll have to solve it as part of that process.
We need to think about how to handle non-Unicode text in two situations:
In the UI I'd like for forms to identify fields with non-Unicode text as errors, and display the offending code along with context so the user can fix it. This UI would apply to new text entered in forms as well. Display templates should prominently flag bad text and invite the user to fix it.
ddrimport
should flag non-Unicode text as errors that must be fixed before records can be imported.
Awhile back I put together fileio.read_text
and .write_text
functions that were designed to be able to work with Unicode. The plan was to have all code that reads or writes use those two functions, but most code doesn't yet.
My idea was that fileio.read_text
would have several modes. In strict mode it would simple raise an exception for bad text. In permissive mode it would return text with bad chars marked, along with the original text. This would allow higher-level code to display the raw code if necessary.
We'll need a script that goes through all the text in the system and finds bad text.
Ultimately when we get to Python3 no non-Unicode text should even enter the system.
Be able to ingest/index unicode, in particular, Japanese language