Open mschoch opened 9 years ago
Seems like a good plan (defaulting to 2, and optional for advanced users to choose 1). Assuming that stuff like this would one day be something in the index mapping.
Moving this to 2.0. For 1.0 we'll simply document that we only support UTF-8.
Current Status:
Bleve made the decision early on to focus on utf-8 data. Not necessarily at the exclusion of other encodings, but if savings/optimizations/internal formats came with utf-8, that was the direction we would go.
Two issues come up:
For now, I'm not trying to solve item 2 above, but if a solution to 1 opens the door for 2, then that should be considered as well.
These are the options I see:
Currently Bleve follows 1.
My preference is for defaulting to 2, but allowing advanced users to switch back to 1 if they know what they're doing.
I'm not a big fan of 3 right now because it prescribes a solution that may be slow and incorrect. This means the burden would still be on the application to ensure correct utf-8 bytes are input, but it would mean that Bleve is safe on all inputs.
I welcome other thoughts on this issue.
CC @steveyen