alpha3 on Manjaro: registering a new public user closes the app first time

star-buck commented 6 years ago

when hitting "Next" the app simply closes.

star-buck commented 6 years ago

tested this version : https://github.com/ring-project/ring/releases/download/3.0.0-alpha3/ring-kde_3.0.0-alpha3-x86_64-X.appimage

star-buck commented 6 years ago

@Elv13 : any update on this?

Elv13 commented 6 years ago

soon, right now I sort Kirigami things with Marco then I get back on making that build with most issues you reported being addressed. I got an appointment at the dentist at 14:00, but hopefully I should be done with this by then. Otherwise it will be tomorrow.

As far as this issue goes, I made a few registered accounts without crash so far. I will test with other laptops and more distros to keep trying to replicate. I will then try to create a valgrind friendly build and see if that helps to catch it. So far it is still in the "can't reproduce, gimme backtrace and or it might take a while" area. I am not saying the issue is not real. I am just saying the test I performed so far do not reproduce it and I would not have released Alpha3 if I had seen such problem. A backtrace, even full of ??? symbols would still help greatly pinpoint the circumstances that causes the crash.

Elv13 commented 6 years ago

Ok, marco got me the BT

Elv13 commented 6 years ago

Ok, so the BT said this was when ring-kde calls the daemon addressLookup. So far, so unsurprising. This was already obvious the whole time since that's what the button does. So little progress there. So I undusted more laptops:

ThinkPad W540 (haswell): No issue (30 attempts)
MacBook Pro Retina (Ivy Bridge): No issue (30 attempts)
Thinkpad X201 (Nahelm): Reproduced twice (30 attempts)

Next I built this AppImage with less hardening so I can debug:

https://drive.google.com/file/d/1XlDiZ3Pa588tigP410I30sKe-EYpJ6eq/view?usp=sharing

But it failed to reproduce the issue with it. If someone want to try it and can reproduce the issue (and give me the (GDB thread apply all bt full), this would be appreciated. Valgrind wont run on this and it also doesn't happen using LLVM-ASAN (a valgrind alternative)

However having the debug symbols locally, I used addr2line with the .text binary section offset to get a better idea. So the crash is in secp256k1_create_context in the daemon (LibRing). What is happening and why this is happens is still to be determined. I still have trouble to reproduce it and given it only happens on hardened AppImages on my slowest laptop, debugging is a nightmare (well, that's the point of binary hardening...).

In any case, progress! But no quick fix.

Elv13 commented 6 years ago

Ok, I made a bot to create accounts and let it run all night on the X201.

It's a good old fashion twisted race condition between 5(!) threads in LibRing, LibRingQt and the main (QML) threads... Short version, some (LibRing/daemon) signals are emitted from a thread then put in Qt signal queue to avoid being executed using the wrong thread (this is correct). However because the key generation is slow, it can cause another signal to arrive later than expected and be placed in the same queue. The Qt model is then created using a shared pointer, but because it is not yet valid and has no consumers (yet), all references get dropped and the model is destroyed. This happens while the QML is initializing the timeline and then when the QItemSelectionModel is already set and the data model is being recreated. Because QItemSelectionModel takes only a raw pointer instead of a reference counted one, it is not notified that the data is in the process of being invalidated. This creates a small window of opportunity for a null deference during the end of the key initialization process. That's why secp256k1_create_context was in the backtrace.

That makes sense now, it cannot happen on more recent/powerful CPU because the key will always be generated faster than the other network I/O operations it races against.

star-buck commented 6 years ago

Note: I test on a very old machine, where ring itself always run fine until now, do it better runs there fine again, as im not getting a new machine just for it... On Feb 14, 2018 18:48, "Emmanuel Lepage Vallée" notifications@github.com wrote:

Ok, I made a bot to create accounts and let it run all night on the X201.

It's a good old fashion twisted race condition between 5(!) threads in LibRing, LibRingQt and the main (QML) threads... Short version, some (LibRing/daemon) signals are emitted from a thread then put in Qt signal queue to avoid being executed using the wrong thread (this is correct). However because the key generation is slow, it can cause another signal to arrive later than expected and be placed in the same queue. The Qt model is then created using a shared pointer, but because it is not yet valid and has no consumers (yet), all references get dropped and the model is destroyed. This happens while the QML is initializing the timeline and then when the QItemSelectionModel is already set and the data model is being recreated. Because QItemSelectionModel takes only a raw pointer instead of a reference counted one, it is not notified that the data is in the process of being invalidated. This creates a small window of opportunity for a null deference during the end of the key initialization process. That's why secp256k1_create_context was in the backtrace.

That makes sense now, it cannot happen on more recent/powerful CPU because the key will always be generated faster than the other network I/O operations it races against.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ring-project/ring/issues/22#issuecomment-365688836, or mute the thread https://github.com/notifications/unsubscribe-auth/AAzRhjpDYhB_T1mEZvF0YX9qg65pwBa_ks5tUxyBgaJpZM4R_1Ct .

Elv13 commented 6 years ago

Of course

This was just to document what the bug was and why I had hard time to reproduce it. It is not an excuse for the bug existence. I just bought a pineboard and odroid to run my tests on. It should help catch more of these crashers.

banji-project / ring-issues

alpha3 on Manjaro: registering a new public user closes the app first time #22