Closed DmKoshelek closed 5 months ago
Hi @DmKoshelek ,
I think there may be a few problems here. One of these problems may be related to this issue: https://github.com/aido/app-seed-tool/issues/23#issuecomment-1883166833
Note that the first 10 words of each SSKR share is a CBOR header plus metadata and this looks correct in all the shares you provided. The next 32 words is the share data and the last 4 words are a CRC checksum.
From previous experience with similar issues the physical device seems to be operating differently to the emulator used for testing (Speculos). Previously CRC32 functionality was shown to be faulty on the physical devices but that was fixed by Ledger.
I shall investigate further.
I have tested v1.7.1 of the app installed from provider 4 on a physical Ledger Nano S. The meta data and share data for all shares seems to be generated correctly but the checksum is not.
The checksums in your test on a Nano S+ and also the checksums in my test on a Nano S exhibit the same behaviour as this comment: https://github.com/aido/app-seed-tool/issues/23#issuecomment-1883166833.
The first 2 words of every checksum is always able able
(i.e. 0x0000). This would suggest that issue #23 is not fixed.
One possibility is that the app on provider 4 was built with a version of the SDK where CRC32 functionality is still broken.
I will wait to get Ledger's or @lpascal-ledger opinion on this. A unit test for the CRC32 function in the SDK may help.
I used the build from My Ledger provider 4 as it was suggested here https://github.com/aido/app-seed-tool/discussions/15#discussioncomment-9304773. I am not sure which SDK was used there to build the app
Hi folks,
Indeed the deployed version was buggy. It uses the correct SDK but the code uses the cx_crc32_hw
function which redirects to the OS buggy implementation, rather than the fixed cx_crc32
SDK implementation.
I'll redeploy the app with the commits of @xchapron-ledger. @aido could you confirm this changelog / version suits your constraints?
Fine by me! Your code already contains the correct calls, we only deployed before they were merged.
While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6
While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6
Yes, there is a PR currently a work in progress that adds the Stax UI but also removes that line:: LedgerHQ/app-seed-tool#5
So, line will be gone in next version after v1.7.2.
@aido I take from your approval that we don't wait for another PR, we can merge & deploy, is that correct?
Yes. No need to wait for another PR. Your v1.7.2 bump PR is good enough to get a working and decent version of the app deployed.
Any other minor updates I need to make can be rolled into the Stax PR that is currently a work in progress.
Thanks.
v1.7.2 has been deployed on provider 4.
I have installed v1.7.2 from provider 4 and all looks good. Shares and CRC32 seem correct .
Device | Nano S |
OS | 2.1.0 |
App Version | 1.7.2 |
Status | $${\color{green}working}$$ |
If @DmKoshelek confirms same for Nano S+ then I will close this ticket.
Hi guys,
Today I tried to generate shares again but with my main seed phrase on Nano S+ and suddenly it doesn't work anyway. CRC32 part looks different compare to previous version, last 43-46 words are different for each phrase
but data words (11-42) for 2-5 shares are still "drum" only
Hi @DmKoshelek ,
Thanks for checking.
but data words (11-42) for 2-5 shares are still "drum" only
This is strange behaviour. Strange on two counts, Strange that the word drum
is consistently repeating and also strange that the first share seems to be good and after that it just goes berserk.
The code for generating phrases on a Nano S is exactly the same as code for generating phrases on a Nano S+ except for one thing.
The Nano S does not have the Galois Field multiplication syscall cx_bn_gf2_n_mul()
so it uses its own function for this. All other devices use the cx_bn_gf2_n_mul()
syscall.
So
cx_bn_gf2_n_mul()
sysall but instead uses a function within the app for Galois Field multiplication and all tests pass: https://github.com/aido/app-seed-tool/blob/2f3617c3f773e72efd526a48981316fa2e21b266/src/sskr/sss/interpolate.c#L24-L134cx_bn_gf2_n_mul()
syscall on Nano S+ and does not seem to be working.All this evidence would seem to suggest that the cx_bn_gf2_n_mul()
syscall on Nano S+ may be flawed.
Another possibility is that the app is not using the syscall correctly.
Unfortunately I do not have a Nano S+ or Nano X device to perform my own hardware testing.
Oh dear!
Further investigation reveals that the cx_bn_gf2_n_mul()
syscall requires a parameter for the Second Montgomery constant which none of the other functions mentioned above use.
The app is using a value of 0xA1 for this constant. However sandra-Ledger has informed me on Ledger's Discord channel that 0x02 would be a more appropriate value. I think the Second Montgomery only makes the multiplication more efficient rather than changing the actual result. But we shall see.
I will try this recommended value and create a new PR for v1.7.3
Hi @DmKoshelek ,
v1.7.3 has now been deployed and hopefully fixes this issue.
v1.7.3 changes the value of the Second Montgomery constant as suggested by @srasoamiaramanana-ledger. This constamt is used by the cx_bn_gf2_n_mul()
syscall on Nano S+ and Nano X devices.
I have installed v1.7.3 from provider 4 and all looks good on a physical Nano S. | ||
---|---|---|
Device | Nano S | |
OS | 2.1.0 | |
App Version | 1.7.3 | |
Status | $${\color{green}working}$$ |
[!NOTE] The Nano S does not use the
cx_bn_gf2_n_mul()
syscall so v1.7.3 will also need to be tested on a physical Nano S+ or Nano X to confirm that the change to the Montgomery constant fixed this issue.
Hi @aido,
Thanks for the update! Today I checked the new version (1.7.3) and, unfortunately, it still doesn't work. First share is fine as previously but 2-6 shares now have repeated "toil" word instead of "drum" on data part of the shares (11-42 words)
Also just for fun I tried two generate 2 shares with 2 shares threshold and both shares was invalid with this configuration and had "able" word on data word positions
OS version: 1.1.1
Damn! :cry:
@DmKoshelek Thanks a lor for testing.
So, here's the situation:
cx_bn_gf2_n_mul()
function and all tests pass: WORKScx_bn_gf2_n_mul()
function, not the syscall: WORKScx_bn_gf2_n_mul()
syscall and a Ledger recommended value for the Second Montgomery constant: FAILS@lpascal-ledger or @srasoamiaramanana-ledger I am loath to say that the above evidence would suggest that the cx_bn_gf2_n_mul()
syscall is broken.
I still find it strange that the first share on a Nano S+ seems fine but after that the rest are bad. I will investigate this and try figure out why some hardware devices are doing something different to Speculos. From a previous issue, when Speculos behaved differently to the hardware the syscall turned out to be the issue e.g. cx_crc32()
. Just sayin'! 😃
I may create a new PR for a v1.7.4 that replaces the cx_bn_gf2_n_mul()
syscall with an in app function on all devices. This should eliminate or confirm cx_bn_gf2_n_mul()
as the cause of this issue. Only problem is that the internal function will probably not be as efficient as the syscall.
Hi,
We have some tests for the cx_bn_gf2_n_mul()
syscall and I verified that the result for these tests are the same on a physical device (LNS+) and on speculos. We test multiplication over GF(2^32), GF(2^64), GF(2^128). For your specific case, I added test vectors over GF(2^8) and I also tested that the results on a real device and on speculos are identical.
You can find here GF2NMUL.txt the test vectors with the parameters described as follows:
extension degree : modulus : second montgomery constant : e : f : r :
I've generated the parameters for the GF(2^8) multiplication with sagemath as follows:
R.<x> = GF(2)[]
F = GF(2).extension(modulus = x^8 + x^4 + x^3 + x + 1, name = 'a')
for e in F:
f = F.random_element()
r = e * f
le = e.polynomial().coefficients(sparse=False)
ne = ZZ(le, 2)
lf = f.polynomial().coefficients(sparse=False)
nf = ZZ(lf, 2)
lr = r.polynomial().coefficients(sparse=False)
nr = ZZ(lr, 2)
ln = F.modulus().coefficients(sparse=False)
nn = ZZ(ln, 2)
h = F(x**256)
lh = h.polynomial().coefficients(sparse=False)
nh = ZZ(lh, 2)
print(f"8:{nn}:{nh}:{ne}:{nf}:{nr}:")
Could you try to call cx_bn_gf2_n_mul()
without overlapping the variable for the result and the variables for the operand ?
@srasoamiaramanana-ledger,
Thanks for investigating.
I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.
I had thought that overlapping the result variables with the inputs may possibly cause a problem but the documentation did not discourage it so I didn't think it would be an issue. But we'll see.
I may create a v1.7.4.rc1
I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.
@aido Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how =)
Thanks @DmKoshelek, I very much appreciate your assistance on this.
I have created a pull request on the Ledger fork which removes the cx_bn_gf2_n_mul()
syscall from the equation altogether. Then by the process of elimination we will know for sure if this is the problem area. Hopefully that pull request can be released on provider 4 for easy installation and testing.
If by this process of elimination we find cx_bn_gf2_n_mul()
to be the culprit we can hopefully home in on a solution by implementing the suggestion above:
Could you try to call cx_bn_gf2_n_mul() without overlapping the variable for the result and the variables for the operand ?
We'll get there ... eventually! 😃
Have a great weekend.
Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how
Hi @DmKoshelek,
v1.7.4-rc.1 is now available on provider 4. I am praying that this version works on Nano S+ and Nano X.
Hi @aido
Great news for you, looks like this update is working! I was able to generate 6 shares and recover back my BIP39 phrase using 3 of them. Tomorrow I will try to use 4-6 shares to recover BIP39 seed but I believe it will also work
My congratulation!
That is great news indeed. 🥳
This proves without doubt that the problem area is related to how the app is using the cx_bn_gf2_n_mul()
syscall.
I shall now create a v1.7.4-rc.2 that ensures that in all calls to cx_bn_gf2_n_mul()
the result parameter does not overlap with the operands. This rule will need to be documented in the SDK also but lets get a working v1.7.4-rc.2 first.
This hopefully will be the final pre-release of a v1.7.4 that should work on all devices without issue! 🤞
Thanks again @DmKoshelek for finding this bug and then helping troubleshoot it.
Hi @DmKoshelek,
v1.7.4-rc.2 is now available to install from provider 4.
I have removed all overlapping results and operands in the cx_bn_gf2_n_mul()
syscall and I am confident that this version now works on all devices.
If this version passes the test it will be the last pre-release before going to v1.7.4 final.
Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it
hmm, strange. v1.7.4-rc.2 was deployed on provider 4 earlier today and I was able to install it on a Nano S. Let me double check Ledger Live.
Update Yep, v1.7.4-rc.2 from provider 4 was installed on my Nano S several hours ago.
6 hours ago @lpascal-ledger stated:
The deployment is in progress, but it could take a few hours depending on CI parameters (tooling update, runners availability, ...) and/or because we generally wait for a minimal batch of apps to sign & release them fully.
@DmKoshelek, I would imagine that if it was available for installation on Nano S it would also be available on the other devices at same time or very shortly thereafter.
@DmKoshelek @aido It's probably because I deployed on the newest NanoS+ 1.2.0 OS version which is not fully distributed yet. I'll deploy it on 1.1.1 today.
Should be good now.
Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it
Hi @DmKoshelek,
Any luck installing v1.7.4-rc.2 on a Nano S+ with OS version 1.1.1?
Hi @aido
Sorry for the delay, I totally forgot to test the new version.
I have just download it and it works fine as well, I was able to create shares and recover my seed phrase on my device!
@aido
also I found one new issue during testing, I created a separate topic for it
Thanks @DmKoshelek
While I investigate the new issue can you can confirm if this issue is fixed for threshold values other than 1?
@aido, yes, the issue is fixed, thanks again for fixing!
That s excellent. I am quite happy that I can close this issue.
I have also made a small bit of progress with the other issue. It does in fact seem to be happening when testing with Speculos. That will make testing and troubleshooting much much easier and quicker. The problem is most likely to be some edge case that needs to be caught. But we'll see.
I tried to generate SSKR phrases with the app and they are looking weird. All of them except first one just contains repeated "Drum" word in positions from 11 to 42 (I believe other words just contains meta information and checksums). And for sure, I can't recover my bip39 phrase from this SSKR shares. My Seed phrace contains 24 words but to create this issue I generated a new one. I tried to use this shares to recover my seed phrase using seedtool-cli but for sure got error that shares are invalid
Steps to reproduce the behavior:
Device: