ethereum / staking-deposit-cli

Secure key generation for deposits
Creative Commons Zero v1.0 Universal
531 stars 330 forks source link

Common characters being overridden in chinese simplified and chinese Traditional word lists #410

Open anupsv opened 3 months ago

anupsv commented 3 months ago

While going through the code, this code here is causing a problem but hiding it. There are common characters between the 2 word lists and when the map is created in the referenced code, if there is a collision in keys, it is silently overridden to the new file being read. This issue can be seen if the code checks for collision of keys.

yorickdowne commented 1 month ago

@anupsv Can I invite you to discuss this with us at https://github.com/eth-educators/ethstaker-deposit-cli/issues/119. If you have suggestions for a solution, or good ways to test the issue, that will be highly welcome.

anupsv commented 1 month ago

Hi @yorickdowne, sure happy to join and provide suggestions for the fix. I do have a fix for the issue as well.

yorickdowne commented 1 month ago

@anupsv Oh amazing. What I’m looking for is a sample mnemonic that triggers the issue in current code, and a description of what should happen and what does happen instead. With that I can write a test case.

And then with the fix in, that test will succeed.

anupsv commented 1 month ago

No problem. I can produce the required test case and the fix for it.

yorickdowne commented 3 weeks ago

@anupsv We've written some code to be able to have two languages per symbol/word. We are missing a test case. Please give us a mnemonic that caused a failure before, and describe the failure you saw.

Thank you!

anupsv commented 3 weeks ago

@yorickdowne This was one of the vectors that had failed. 的 的 的 的 的 的 的 的 的 的 的 在

anupsv commented 3 weeks ago

@yorickdowne Related to the fix, I was thinking which more and more common characters, we'd need to look at the entire mnemonic if needed but I hadn't restricted to 2 languages. Was trying to solve for anycase but if you think limiting to 2 languages is good enough for now, sounds good!

yorickdowne commented 19 hours ago

I don’t know that we’re limiting to 2, would need to look at the code again. At any rate, we are now prompting users for their desired language if the mnemonic is valid in 2 or more.

Thank you for the report!