aido / app-seed-tool

A Ledger application that provides some useful seed management utilities
Apache License 2.0
47 stars 3 forks source link

"Drum" word duping in generated SSKR shares #32

Closed DmKoshelek closed 5 months ago

DmKoshelek commented 6 months ago

I tried to generate SSKR phrases with the app and they are looking weird. All of them except first one just contains repeated "Drum" word in positions from 11 to 42 (I believe other words just contains meta information and checksums). And for sure, I can't recover my bip39 phrase from this SSKR shares. My Seed phrace contains 24 words but to create this issue I generated a new one. I tried to use this shares to recover my seed phrase using seedtool-cli but for sure got error that shares are invalid

Steps to reproduce the behavior:

  1. Reset Ledger Nano S+ by next 24 words seed phrase (it's new generated one don't worry)
    head bulb beauty miracle vessel kick toddler bamboo defense open hill fiscal feature illness fence reunion hub song glory occur cannon enjoy wasp common
  2. Install the Seed App
  3. Open Seed App
  4. Select "Check BIP39 phase"
  5. Select "24 words"
  6. Input the seed phrase and complete check
  7. Select number of shares - 6
  8. Select number of threshold - 3
  9. Get the list of shares
  10. tuna acid epic hard data axis whiz able also able yurt poem iron note idle vast quiz when each zone unit curl twin quiz navy aunt dice whiz onyx skew yurt meow visa tuna calm fuel whiz purr cola cola legs hang able able numb frog
  11. tuna acid epic hard data axis whiz able also acid drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able oval flux
  12. tuna acid epic hard data axis whiz able also also drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able vows calm
  13. tuna acid epic hard data axis whiz able also apex drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able jazz jowl
  14. tuna acid epic hard data axis whiz able also aqua drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able fuel draw
  15. tuna acid epic hard data axis whiz able also arch drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able scar gyro
  16. Check seed phrace back using seedtool-cli
    docker run --rm -it seedtool-cli -i sskr
    tuna acid epic hard data axis whiz able also apex drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able jazz jowl
    tuna acid epic hard data axis whiz able also aqua drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able fuel draw
    tuna acid epic hard data axis whiz able also arch drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able scar gyro
    ^D
    seedtool: Invalid Bytewords.

Device:

aido commented 6 months ago

Hi @DmKoshelek ,

I think there may be a few problems here. One of these problems may be related to this issue: https://github.com/aido/app-seed-tool/issues/23#issuecomment-1883166833

Note that the first 10 words of each SSKR share is a CBOR header plus metadata and this looks correct in all the shares you provided. The next 32 words is the share data and the last 4 words are a CRC checksum.

From previous experience with similar issues the physical device seems to be operating differently to the emulator used for testing (Speculos). Previously CRC32 functionality was shown to be faulty on the physical devices but that was fixed by Ledger.

I shall investigate further.

aido commented 6 months ago

I have tested v1.7.1 of the app installed from provider 4 on a physical Ledger Nano S. The meta data and share data for all shares seems to be generated correctly but the checksum is not.

The checksums in your test on a Nano S+ and also the checksums in my test on a Nano S exhibit the same behaviour as this comment: https://github.com/aido/app-seed-tool/issues/23#issuecomment-1883166833.

The first 2 words of every checksum is always able able (i.e. 0x0000). This would suggest that issue #23 is not fixed.

One possibility is that the app on provider 4 was built with a version of the SDK where CRC32 functionality is still broken.

I will wait to get Ledger's or @lpascal-ledger opinion on this. A unit test for the CRC32 function in the SDK may help.

DmKoshelek commented 6 months ago

I used the build from My Ledger provider 4 as it was suggested here https://github.com/aido/app-seed-tool/discussions/15#discussioncomment-9304773. I am not sure which SDK was used there to build the app

lpascal-ledger commented 6 months ago

Hi folks,

Indeed the deployed version was buggy. It uses the correct SDK but the code uses the cx_crc32_hw function which redirects to the OS buggy implementation, rather than the fixed cx_crc32 SDK implementation.

I'll redeploy the app with the commits of @xchapron-ledger. @aido could you confirm this changelog / version suits your constraints?

lpascal-ledger commented 6 months ago

Fine by me! Your code already contains the correct calls, we only deployed before they were merged.

xchapron-ledger commented 6 months ago

While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6

aido commented 6 months ago

While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6

Yes, there is a PR currently a work in progress that adds the Stax UI but also removes that line:: LedgerHQ/app-seed-tool#5

So, line will be gone in next version after v1.7.2.

lpascal-ledger commented 6 months ago

@aido I take from your approval that we don't wait for another PR, we can merge & deploy, is that correct?

aido commented 6 months ago

Yes. No need to wait for another PR. Your v1.7.2 bump PR is good enough to get a working and decent version of the app deployed.

Any other minor updates I need to make can be rolled into the Stax PR that is currently a work in progress.

Thanks.

lpascal-ledger commented 6 months ago

v1.7.2 has been deployed on provider 4.

aido commented 6 months ago

I have installed v1.7.2 from provider 4 and all looks good. Shares and CRC32 seem correct .

Device Nano S
OS 2.1.0
App Version 1.7.2
Status $${\color{green}working}$$

If @DmKoshelek confirms same for Nano S+ then I will close this ticket.

DmKoshelek commented 6 months ago

Hi guys,

Today I tried to generate shares again but with my main seed phrase on Nano S+ and suddenly it doesn't work anyway. CRC32 part looks different compare to previous version, last 43-46 words are different for each phrase

  1. ... when figs jolt sets
  2. ... peck door user tent
  3. ... news trip glow cusp
  4. ... code arch ruby zero
  5. ... door yell drop away

but data words (11-42) for 2-5 shares are still "drum" only

aido commented 6 months ago

Hi @DmKoshelek ,

Thanks for checking.

but data words (11-42) for 2-5 shares are still "drum" only

This is strange behaviour. Strange on two counts, Strange that the word drum is consistently repeating and also strange that the first share seems to be good and after that it just goes berserk.

The code for generating phrases on a Nano S is exactly the same as code for generating phrases on a Nano S+ except for one thing. The Nano S does not have the Galois Field multiplication syscall cx_bn_gf2_n_mul() so it uses its own function for this. All other devices use the cx_bn_gf2_n_mul() syscall.

So

All this evidence would seem to suggest that the cx_bn_gf2_n_mul() syscall on Nano S+ may be flawed. Another possibility is that the app is not using the syscall correctly.

Unfortunately I do not have a Nano S+ or Nano X device to perform my own hardware testing.

Oh dear!

aido commented 6 months ago

Further investigation reveals that the cx_bn_gf2_n_mul() syscall requires a parameter for the Second Montgomery constant which none of the other functions mentioned above use.

The app is using a value of 0xA1 for this constant. However sandra-Ledger has informed me on Ledger's Discord channel that 0x02 would be a more appropriate value. I think the Second Montgomery only makes the multiplication more efficient rather than changing the actual result. But we shall see.

I will try this recommended value and create a new PR for v1.7.3

aido commented 5 months ago

Hi @DmKoshelek ,

v1.7.3 has now been deployed and hopefully fixes this issue.

v1.7.3 changes the value of the Second Montgomery constant as suggested by @srasoamiaramanana-ledger. This constamt is used by the cx_bn_gf2_n_mul() syscall on Nano S+ and Nano X devices.

aido commented 5 months ago
I have installed v1.7.3 from provider 4 and all looks good on a physical Nano S.
Device Nano S
OS 2.1.0
App Version 1.7.3
Status $${\color{green}working}$$

[!NOTE] The Nano S does not use the cx_bn_gf2_n_mul() syscall so v1.7.3 will also need to be tested on a physical Nano S+ or Nano X to confirm that the change to the Montgomery constant fixed this issue.

DmKoshelek commented 5 months ago

Hi @aido,

Thanks for the update! Today I checked the new version (1.7.3) and, unfortunately, it still doesn't work. First share is fine as previously but 2-6 shares now have repeated "toil" word instead of "drum" on data part of the shares (11-42 words)

Also just for fun I tried two generate 2 shares with 2 shares threshold and both shares was invalid with this configuration and had "able" word on data word positions

OS version: 1.1.1

aido commented 5 months ago

Damn! :cry:

@DmKoshelek Thanks a lor for testing.

So, here's the situation:

@lpascal-ledger or @srasoamiaramanana-ledger I am loath to say that the above evidence would suggest that the cx_bn_gf2_n_mul() syscall is broken.

I still find it strange that the first share on a Nano S+ seems fine but after that the rest are bad. I will investigate this and try figure out why some hardware devices are doing something different to Speculos. From a previous issue, when Speculos behaved differently to the hardware the syscall turned out to be the issue e.g. cx_crc32(). Just sayin'! 😃

I may create a new PR for a v1.7.4 that replaces the cx_bn_gf2_n_mul() syscall with an in app function on all devices. This should eliminate or confirm cx_bn_gf2_n_mul() as the cause of this issue. Only problem is that the internal function will probably not be as efficient as the syscall.

srasoamiaramanana-ledger commented 5 months ago

Hi, We have some tests for the cx_bn_gf2_n_mul() syscall and I verified that the result for these tests are the same on a physical device (LNS+) and on speculos. We test multiplication over GF(2^32), GF(2^64), GF(2^128). For your specific case, I added test vectors over GF(2^8) and I also tested that the results on a real device and on speculos are identical. You can find here GF2NMUL.txt the test vectors with the parameters described as follows: extension degree : modulus : second montgomery constant : e : f : r : I've generated the parameters for the GF(2^8) multiplication with sagemath as follows:

R.<x> = GF(2)[]
F = GF(2).extension(modulus = x^8 + x^4 + x^3 + x + 1, name = 'a')
for e in F:
    f = F.random_element()
    r = e * f
    le = e.polynomial().coefficients(sparse=False)
    ne = ZZ(le, 2)
    lf = f.polynomial().coefficients(sparse=False)
    nf = ZZ(lf, 2)
    lr = r.polynomial().coefficients(sparse=False)
    nr = ZZ(lr, 2)
    ln = F.modulus().coefficients(sparse=False)
    nn = ZZ(ln, 2)
    h = F(x**256)
    lh = h.polynomial().coefficients(sparse=False)
    nh = ZZ(lh, 2)
    print(f"8:{nn}:{nh}:{ne}:{nf}:{nr}:")

Could you try to call cx_bn_gf2_n_mul() without overlapping the variable for the result and the variables for the operand ?

aido commented 5 months ago

@srasoamiaramanana-ledger,

Thanks for investigating.

I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.

I had thought that overlapping the result variables with the inputs may possibly cause a problem but the documentation did not discourage it so I didn't think it would be an issue. But we'll see.

I may create a v1.7.4.rc1

DmKoshelek commented 5 months ago

I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.

@aido Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how =)

aido commented 5 months ago

Thanks @DmKoshelek, I very much appreciate your assistance on this.

I have created a pull request on the Ledger fork which removes the cx_bn_gf2_n_mul() syscall from the equation altogether. Then by the process of elimination we will know for sure if this is the problem area. Hopefully that pull request can be released on provider 4 for easy installation and testing.

If by this process of elimination we find cx_bn_gf2_n_mul() to be the culprit we can hopefully home in on a solution by implementing the suggestion above:

Could you try to call cx_bn_gf2_n_mul() without overlapping the variable for the result and the variables for the operand ?

We'll get there ... eventually! 😃

Have a great weekend.

aido commented 5 months ago

Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how

Hi @DmKoshelek,

v1.7.4-rc.1 is now available on provider 4. I am praying that this version works on Nano S+ and Nano X.

DmKoshelek commented 5 months ago

Hi @aido

Great news for you, looks like this update is working! I was able to generate 6 shares and recover back my BIP39 phrase using 3 of them. Tomorrow I will try to use 4-6 shares to recover BIP39 seed but I believe it will also work

My congratulation!

aido commented 5 months ago

That is great news indeed. 🥳

This proves without doubt that the problem area is related to how the app is using the cx_bn_gf2_n_mul() syscall.

I shall now create a v1.7.4-rc.2 that ensures that in all calls to cx_bn_gf2_n_mul() the result parameter does not overlap with the operands. This rule will need to be documented in the SDK also but lets get a working v1.7.4-rc.2 first.

This hopefully will be the final pre-release of a v1.7.4 that should work on all devices without issue! 🤞

Thanks again @DmKoshelek for finding this bug and then helping troubleshoot it.

aido commented 5 months ago

Hi @DmKoshelek,

v1.7.4-rc.2 is now available to install from provider 4. I have removed all overlapping results and operands in the cx_bn_gf2_n_mul() syscall and I am confident that this version now works on all devices. If this version passes the test it will be the last pre-release before going to v1.7.4 final.

DmKoshelek commented 5 months ago

Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it

aido commented 5 months ago

hmm, strange. v1.7.4-rc.2 was deployed on provider 4 earlier today and I was able to install it on a Nano S. Let me double check Ledger Live.

Update Yep, v1.7.4-rc.2 from provider 4 was installed on my Nano S several hours ago.

6 hours ago @lpascal-ledger stated:

The deployment is in progress, but it could take a few hours depending on CI parameters (tooling update, runners availability, ...) and/or because we generally wait for a minimal batch of apps to sign & release them fully.

@DmKoshelek, I would imagine that if it was available for installation on Nano S it would also be available on the other devices at same time or very shortly thereafter.

lpascal-ledger commented 5 months ago

@DmKoshelek @aido It's probably because I deployed on the newest NanoS+ 1.2.0 OS version which is not fully distributed yet. I'll deploy it on 1.1.1 today.

lpascal-ledger commented 5 months ago

Should be good now.

aido commented 5 months ago

Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it

Hi @DmKoshelek,

Any luck installing v1.7.4-rc.2 on a Nano S+ with OS version 1.1.1?

DmKoshelek commented 5 months ago

Hi @aido

Sorry for the delay, I totally forgot to test the new version.

I have just download it and it works fine as well, I was able to create shares and recover my seed phrase on my device!

DmKoshelek commented 5 months ago

@aido

also I found one new issue during testing, I created a separate topic for it

aido commented 5 months ago

Thanks @DmKoshelek

While I investigate the new issue can you can confirm if this issue is fixed for threshold values other than 1?

DmKoshelek commented 5 months ago

@aido, yes, the issue is fixed, thanks again for fixing!

aido commented 5 months ago

That s excellent. I am quite happy that I can close this issue.

I have also made a small bit of progress with the other issue. It does in fact seem to be happening when testing with Speculos. That will make testing and troubleshooting much much easier and quicker. The problem is most likely to be some edge case that needs to be caught. But we'll see.