cyberman54 / ESP32-Paxcounter

Wifi & BLE driven passenger flow metering with cheap ESP32 boards
https://cyberman54.github.io/ESP32-Paxcounter/
Other
1.73k stars 405 forks source link

LoPy4 - LoRaWAN transmission is hanging when using with Chirpstack #743

Closed tjvandam closed 3 years ago

tjvandam commented 3 years ago

When I use the latest release on LoPy4 with basic settings, both in OTAA and ABP mode, the program joins perfectly, but hangs in LoRaWAN transmit mode (RGB LED continues to blink BLUE). Testing the exact same set-up on TTNv2 works perfect.

To me it looks like Chirpstack is triggering something breaking in the code by sending mac commands on the second downlink.

Trying to get some more debug info today.

tjvandam commented 3 years ago

I have turned on debug terminal and found the following error info, and indeed it are the mac commands.

08:35:51.068 > [D][lorawan.cpp:399] myEventCallback(): RXSTART
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x60
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x49
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0xB7
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0xE8
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x00
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x85
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x00
08:35:51.189 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x00
08:35:51.189 > [D][lorawan.cpp:516] mac_decode(): --> LinkADRReq
08:35:51.189 > [D][lorawan.cpp:516] mac_decode(): <-- LinkADRAns
08:35:51.189 > [D][lorawan.cpp:399] myEventCallback(): TXCOMPLETE
08:35:55.159 > [D][lorawan.cpp:399] myEventCallback(): TXSTART
08:35:57.303 > [D][lorawan.cpp:399] myEventCallback(): RXSTART
08:35:58.710 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x60
08:35:58.711 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x49
08:35:58.711 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0xB7
08:35:58.711 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0xE8
08:35:58.711 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x00
08:35:58.711 > [D][lorawan.cpp:524] mac_decode(): --> Unknown MAC command 0x85
08:35:58.711 > [D][lorawan.cpp:516] mac_decode(): --> ResetConf
08:35:58.711 > [D][lorawan.cpp:516] mac_decode(): --> LinkADRReq
08:35:58.711 > [D][lorawan.cpp:516] mac_decode(): <-- LinkADRAns
08:35:58.711 > [D][lorawan.cpp:399] myEventCallback(): TXCOMPLETE
08:35:59.256 > [I][lorawan.cpp:564] SaveLMICToRTC(): LMIC state saved
08:35:59.256 > [I][reset.cpp:183] enter_deepsleep(): Going to sleep, good bye.
tjvandam commented 3 years ago

The downlink from Chirpstack looks like this: image

cyberman54 commented 3 years ago

You are using deep sleep, can you please retry without deep sleep and check if the issue disappears?

Note: The MAC decoder is experimental, and does not correctl decode all MAC commands.

tjvandam commented 3 years ago

With which setting do you want me to turn off deep-sleep mode? As far as I can see I did not activate sleep cycles in paxcounter.conf

cyberman54 commented 3 years ago

If you ever enabled, setting is stored in NVRAM. Send rcommand by downlink to the module, either to switch off deep sleep, or to reset with factory settings.

rcommand 0x19 0x00 0x20 will disable sleep mode and make this setting permanent. But will need a working downlink.

If downlink is not working, change version number in code, recompile and reflash. New version number will trigger a NVRAM clear on the node once, while it restarts.

tjvandam commented 3 years ago

I will change this and test again. But this exact same code runs 100% fine on TTN, so the only difference that can break the code is the fact that Chirpstack sends different downlinks (mac payload on port 0). Can we make the code to ignore the mac commands?

cyberman54 commented 3 years ago

Ignoring MAC commands will cause OTA to break, so this is not an option. We need to decode the MAC commands. Please open issue in MCCI LMIC repository, since this is likely an LMIC stack issue, not related to paxcounter.

cyberman54 commented 3 years ago

What country settings are you using? EU868?

cyberman54 commented 3 years ago

Deep sleep may cause (but should not) cut off a pending MAC downlink handshake too early, this would be a paxcounter related issue. Thus we need to retest without deep sleep.

cyberman54 commented 3 years ago

Do you have a chirpstack server which i can point a RAK7258 gateway to? Then i could try to reproduce this.

tjvandam commented 3 years ago

Yes, you can point your gateway to one of our sites. I will give you access to the chipstack web GUI. Where can I share this safely? Then you can login and add your gateway and device.

cyberman54 commented 3 years ago

use github mail / message

tjvandam commented 3 years ago

I have send the credentials for the server to your email. Let me know if you can get in like this.

tjvandam commented 3 years ago

And yes, I am using EU868

cyberman54 commented 3 years ago

@tjvandam Mail did not arrive :-( Please try cyberman54@arcor.de instead, thx.

cyberman54 commented 3 years ago

@tjvandam EUI is 60c5a8fffe76126a, can connect with packet forwarder or basic station protocol

tjvandam commented 3 years ago

Use packet forwarder please

cyberman54 commented 3 years ago

okay, gateyway is now online and lopy4 node joined already. Now checking...

cyberman54 commented 3 years ago

I can now reproduce issue in my test setup, also with other type of board (a TTGO T-Beam v1.1). Note: you can get enhanced logging from lmic stack by setting #define LMIC_DEBUG_LEVEL 2 in lmic_config.h

cyberman54 commented 3 years ago

The network server immediatley switches the device to SF12 after join. This means there is very limited duty cycle for up- und downlinks. After join the network server talks to device with several MAC commands, while paxcounter is trying to send the first payload.

This may cause driving the device in the duty cycle limit, waiting to get airtime. That's why the blue led keeps flashing, meaning that there is pending payload to transmit.

Try to change configuration of your network server, so that devices are not forced to SF12 after join, then re-test.

cyberman54 commented 3 years ago

And check the time of your network server, it's about 20 seconds off :-)

cyberman54 commented 3 years ago

I can confirm this issue is caused by duty cycle. If you switch off ADR on the node (by rcommand or adjusting defaults in configmanager.cpp), the device stays on SF7 after join, and is working as expected. Looks like class c is working, too. I never tested this before.

tjvandam commented 3 years ago

Nice! I will work on this this evening. Sounds reasonable

cyberman54 commented 3 years ago

@tjvandam Solution to this is probably to configure Maximum allowed data rate = 5in service-profile "paxcounter". I can't test this, since this setting is not available to my user.

see https://forum.chirpstack.io/t/adr-engine-for-my-rak-gateway-is-the-snr-table-correct/9726/2

tjvandam commented 3 years ago

Can you explain why this solves the issue? I have changed the service profile to have max data rate = 5

tjvandam commented 3 years ago

Also, does it mather which LoRaWAN MAC version we use for this version of paxcounter in the device profile?

tjvandam commented 3 years ago

I need to find low power setting for very remote (african bush) stationary locations. So worst case SF12 may be needed. I only need pax counting and no other data, so payloads are very short. Running on a battery pack + small solar. Only WiFi or BLE scanning is ok, but need to find the best option per location (depends on how people use their phone). So the question here remains: What SENDCYCLE can I still use with SF12, not pushing the device in this duty cycle issue? How would you configure the paxcounter knowing these limitations?

cyberman54 commented 3 years ago

You have max data rate = 0 in your server profile, this forces all devices using this profile to only use SF12. Set it to 5, and have ADR enabled on node and server. Then the network server will drive the node automatically to the best effort data rate. Even with SF12 it should not be a problem to (only) transmit counts, if you transmit e.g. all 5 minutes. Disable all other payload on the node, like battery voltage etc.

cyberman54 commented 3 years ago

Paxcounter is based on MCCI LMIC, which is a LoRaWAN 1.0.3 certified stack. Thus, select MAC version 1.0.3 and regional parameters A.

Be aware, that in Africa you probably need to use different frequency and channel plan.

cyberman54 commented 3 years ago

I'm closing this issue now, since it is not paxcounter code related. But you can use this issue for further notes.

tjvandam commented 3 years ago

Thanks @cyberman54 your are the best 👌