Z-Wave-Me / z-way-issues

This repo is only to host issues for Z-Way.
6 stars 1 forks source link

z-way-server continuoulsy crashing in libcrypto #146

Closed dhobbes closed 3 years ago

dhobbes commented 5 years ago

Hi, my new installation of the ZWay server on a raspberry Pi 3B+ was up and running for a week. Now it constantly crashes after a few minutes. Following the debug guidelines, I got the attached trace from gdb:

[Switching to Thread 0x6c8ff450 (LWP 9527)]
0x7603153c in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.2
(gdb) info thread
  Id   Target Id         Frame
  1    Thread 0x76ff48b0 (LWP 9058) "z-way-server" 0x76383030 in nanosleep ()
    at ../sysdeps/unix/syscall-template.S:84
  2    Thread 0x75053450 (LWP 9061) "OptimizingCompi" 0x76979014 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x6c764)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  3    Thread 0x74853450 (LWP 9062) "v8:SweeperThrea" 0x76979014 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x6c8bc)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  4    Thread 0x74843450 (LWP 9063) "v8:SweeperThrea" 0x76979014 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x6c9d4)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  5    Thread 0x74833450 (LWP 9064) "v8:SweeperThrea" 0x76979014 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x6caec)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  6    Thread 0x74823450 (LWP 9065) "v8:SweeperThrea" 0x76979014 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x6cc04)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  7    Thread 0x74813450 (LWP 9066) "zway/core" 0x5bb0a348 in ?? ()
  8    Thread 0x737ff450 (LWP 9069) "zway/webserver" 0x763b1204 in select ()
    at ../sysdeps/unix/syscall-template.S:84
  9    Thread 0x72fff450 (LWP 9070) "zway/timers" 0x76383030 in nanosleep ()
    at ../sysdeps/unix/syscall-template.S:84
---Type <return> to continue, or q <return> to quit---
  10   Thread 0x727ff450 (LWP 9071) "zway/core" 0x763b1204 in select ()
    at ../sysdeps/unix/syscall-template.S:84
  360  Thread 0x6f8ff450 (LWP 9525) "zway/core" 0x7603153c in ?? ()
   from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.2
* 362  Thread 0x6c8ff450 (LWP 9527) "zway/core" 0x7603153c in ?? ()
   from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.2
(gdb) bt
#0  0x7603153c in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.2
#1  0x760314f8 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The last lines in the log file say:

[2018-12-02 13:14:05.131] [D] [zway] RECEIVED: ( 01 1F 00 04 00 02 17 98 81 54 3 5 C7 2F B5 F6 19 FF 2F 4A 14 67 FB B6 0F E0 3E 84 FD C0 5A B6 00 1B )
[2018-12-02 13:14:05.131] [D] [zway] SENT ACK
[2018-12-02 13:14:05.131] [D] [zway] SETDATA devices.2.data.lastReceived = 0 (0x 00000000)
[2018-12-02 13:14:05.131] [I] [zway] Node 2:0 CC Security: Received a secure mes sage
[2018-12-02 13:14:05.131] [D] [zway] SETDATA devices.2.instances.0.commandClasse s.152.data.firstPart = **********
[2018-12-02 13:14:05.131] [I] [zway] Node 2:0 CC Security: passing decrypted pac ket to application level: [ 25 03 ff ]
[2018-12-02 13:14:05.131] [D] [zway] Received reply on job (SwitchBinary Get)
[2018-12-02 13:14:05.131] [D] [zway] SETDATA devices.2.instances.0.commandClasse s.37.data.level = True
[2018-12-02 13:14:05.137] [I] [core] (Mobile App Support) Notify listener (Devic eUpdate): Fibaro Switch (#2) - on

Do you have any idea for me, what I can do?

Best regards

PoltoS commented 5 years ago

Disable the Mobile app.

How you did it? We have seen it for some customers, but were unable to reproduce.

Do you remember how you set up Mobile app? iOS/Android? What were your steps to make it crash?

We really wish to reproduce it on our side to hunt. I woul appreciate any help

dhobbes commented 5 years ago

Hello, thanks for the answer. I can not tell exactly, when we tried to add handy support. I know that we played with presence detection via FirtzBox and also installed mobile apps on iPhone and Android devices. But I can't tell the relationship. I disabled remote access. I also enabled a monthly cloud backup, but there wasn't any backup saved so far, it seems also even not a first one.

I tried to disable the mobile support but didn't succeed in the first place: 1.) From the Apps menu I can see the mobile app under local apps showing one active element. But I'm not able to disable or remove the connected Android phone. 2.) From the phone element I can go to the mobile app, but I can not remove the phnoe element, nor remove the phone from the mobile app itself. 3.) From the "devices / Mobilephone" menu, I can manage the phones, but no phone is listed there and I can not activate nor deactivate the app. Deactivating the app and saving the changes will do nothing. When I come back, the app is active again. 3.) Finally I have removed the folder /opt/z-way-server/automation/modules/MobileAppSupport. That removed the app from z-way-server and also the handy in the "Elements" list.

I will see if it is stable again. at least I can switch the fibaro a few times, without any crash anymore.

Thanks

PoltoS commented 5 years ago

So, is it stable now? Are you able to make it unstable again?

dhobbes commented 5 years ago

Hi, yes. It is stable now. running since then without any problem. I found that the dbg symbols for the libcrypto are supplied with your package. I wanted to retry with dbg smybols installed, but did not have time so far.

When I searched for the problem the first time, I found some threads that mentioned OpenSSL crashes when the peer does not close the connection properly. At that time I didn't thought about the mobile app and could not match that to a closed connection to one of the devices.

Maybe this helps. If it also helps, I can replay with the dbg symbols

PoltoS commented 5 years ago

Would be nice to understand where it crashes inside. We are not able to reproduce the issue on our test sandbox, but we have seen it on boxes of few customers.

PoltoS commented 5 years ago

Looks that this is the same issue: https://github.com/Z-Wave-Me/z-way-issues/issues/147