Closed dabura667 closed 8 years ago
My main motivation for this fix is because it seems like Copay might use BIP39 in backups some way, and I was checking to see the ideographic space usage and luckily I did, as Spanish would have been generating bad wallets (non-BIP39-standard).
Also note that Japanese phrases are unique in that ALL Japanese characters are "breakable" and thus textwrap will break a word in the middle on a line break. Care must be taken to ensure a word is not broken on the line break and shown to the user.
Reference: (breadwallet is still trying to get it right... it is difficult) https://github.com/voisine/breadwallet/issues/231 https://github.com/voisine/breadwallet/commit/dd1bfae75409009f087a72c3d7bd7a10fa845570
Wait a sec, I will add Japanese test vectors.
I tried to alter the test vectors to also use a Japanese passphrase with pbkdf2 instead of just "TREZOR"
so that it could test normalizing of passphrase as well.
I realized how long it will take, and I don't have enough time, so I will just submit this PR as is.
Here are the Japanese test vectors I have prepared: non-normalized strings: https://raw.githubusercontent.com/bip32JP/bip32JP.github.io/6f6090b49bb718711904468bce99a73770e09071/test_JP_BIP39.json normalized (except for spaces) strings: https://raw.githubusercontent.com/bip32JP/bip32JP.github.io/377f72c5087533c34c79ba02335d1fbc5509dfa5/test_JP_BIP39.json
LGTM, great work.
LGTM
This library was not using normalization at all, which for Spanish and Japanese would have produced invalid seeds. (So anyone using this library to generate wallets from Spanish or Japanese phrases must use the old non-normalizing version to recover their funds. I doubt anyone is there... but a warning might be necessary? or maybe create a new function for generating non-normalized seed?)
Chinese and English were ok, as their wordlists were pre-normalized (so unorm.nfkd(words) == words) so they should be fine as is.
Also, I added in one change for Japanese, mentioned on the BIP39: Japanese must be shown to the user being separated by an ideographic space. This is crucial to ensure users don't accidentally view 2 words as 1 word.
Ex.
It doesn't seem that necessary when letters are so small, but looking at Japanese.
It makes a huge difference, and the latter must be shown to the user.
Also notice that ideographic space will be replaced by ASCII space when NFKD normalized, so while the words themselves are NFKD in the wordlist, because Japanese requires non-NFKD ideographic spaces for the "phrase" string, I have placed a catch-all NFKD in the call to pbkdf2 around
this.phrase
Users in Japan will also likely input the phrase using ideographic spaces to input it, so I NFKD the mnemonic input to outward facing functions.