The murmur2 code doesn't fit well on special chars (#521)
$ ./compare.sh 'content: "é"'
+ Hashes do not match
+ native: corivl
+ JavaScript: nawgoz
+ [1]
$ ./compare.sh 'content: "®"'
+ Hashes do not match
+ native: 3084ue
+ JavaScript: ufik94
+ [1]
$ ./compare.sh 'content: "😀"'
+ Hashes do not match
+ native: cqlzcw
+ JavaScript: 1u8rdd
+ [1]
How
There were some issues that without handling special char we didn't see before:
JS strings are UTF-16 encoded, so for chars, the max length is 2, but ocaml strings are UTF-8 encoded, meaning they use a variable length (1–4 bytes);
Then I create the get_utf16_char_codes, and instead of handling with string on the murmur, we hold with a list of utf16 char codes.
The emotion murmur has trick cases, where it handles Int in some instances and Int32 in others. So, I changed to primary Int and used Int32 for some specific cases.
It probably worth a blog post about Int and int32 on JS and utf-16
Description
The murmur2 code doesn't fit well on special chars (#521)
How
There were some issues that without handling special char we didn't see before:
get_utf16_char_codes
, and instead of handling with string on the murmur, we hold with a list of utf16 char codes.It probably worth a blog post about Int and int32 on JS and utf-16