Closed GoogleCodeExporter closed 8 years ago
Small clarification: I was not correct about utf8: utf8 may include zero byte
because NUL unicode codepoint is represented as zero byte in utf8. There are no
other possible zero bytes in UTF8 (multibyte codepoints have some bits set in
all bytes) so this is usually not an issue.
Original comment by kmik...@gmail.com
on 25 Aug 2012 at 8:31
In a Python wrapper I've resolved this by encoding <binary_value> to base64.
Encoding/decoding speed doesn't become bottleneck (when using
http://libb64.sourceforge.net/ decoding is orders of magnitude faster than
Completer iteration speed).
Original comment by kmik...@gmail.com
on 4 Sep 2012 at 9:38
I'm sorry that I couldn't help you.
As you mentioned, Completer uses '\0' as the sentinel and it seems to be
impossible to fix this without changing the structure of its auxiliary data.
Also, the change will increase the size of the auxiliary data.
So, now I'm planning to implement a new Completer for binary keys.
Original comment by susumu.y...@gmail.com
on 7 Sep 2012 at 12:10
Thanks! This would be a very useful feature.
Original comment by kmik...@gmail.com
on 7 Sep 2012 at 12:42
I'm really sorry.
It has turned out that it is impossible to support binary keys without changing
its core data structure (major update).
When I posted the last comment, I expected that dawgdic can support binary keys
without changing Dictionary.
However, I've found that not only Completer but also Dictionary require
modification.
In addition, the modification degrades the space efficiency and the time
efficiency of dawgdic.
This drawback may be unacceptable.
In short, base64-encoding may support binary values better than natively
supporting binary keys.
Original comment by susumu.y...@gmail.com
on 14 Sep 2012 at 12:16
Thanks for all your work on this. Binary keys would be good but base64 works
good enough in practice. So maybe the limitation should be documented somewhere?
Original comment by kmik...@gmail.com
on 14 Sep 2012 at 12:20
Thanks for your comments.
I'll update dawgdic to reject binary keys and then update documentation.
Original comment by susumu.y...@gmail.com
on 14 Sep 2012 at 12:40
Looks like a good plan.
Original comment by kmik...@gmail.com
on 14 Sep 2012 at 12:52
I've finished the update of dawgdic.
Also, the documentation has been updated.
Original comment by susumu.y...@gmail.com
on 15 Sep 2012 at 6:51
Original issue reported on code.google.com by
kmik...@gmail.com
on 21 Aug 2012 at 10:19