SignalK / SensESP

Universal Signal K sensor framework for the ESP32 platform
https://signalk.org/SensESP/
Apache License 2.0
153 stars 81 forks source link

Add more debug for "Bad access token detected" #213

Closed tedenda closed 3 years ago

tedenda commented 4 years ago

There have been some reported issues via Slack where SensESP have got successful authorization from Signal K Server but fails when reconnecting. SensESP output "Bad access token detected". To figure this out more debug messages is probably needed.

tedenda commented 4 years ago

@JohnySeven I have seen that you have done other network related things for SensESP. Can this be something that you can look into?

JohnySeven commented 4 years ago

@tedenda I've seen case when device got correct token, but WiFi got disconnected and in that case device will reset token and it will start over. I have something already in place at here: https://github.com/JohnySeven/SensESP/tree/networking_changes But it needs testing before I can make it to PR

tedenda commented 4 years ago

@JohnySeven Sounds good! Please let me know if I can help you test it!

ba58smith commented 3 years ago

The only example I ever saw of this was when the token was REALLY long, because there were a LOT of paths set up in SensESP. I wonder if this is the limitation: line 340 of ws_client.cpp: DynamicJsonDocument doc(1024);. So I changed that 1024 to 100 and added code to display the name of any DeserializationError, and sure enough, I got this:

␛[0m(W) (poll_access_request) WARNING: Could not deserialize http payload.
␛[0m(W) (poll_access_request) Error: NoMemory

So I'm doing a PR to add that DeserializationError display, and the next time this problem occurs, maybe we can resolve it.

tedenda commented 3 years ago

@RBerliner posted this issue in #sensors on Slack:

I am attempting to compile a project for a featheresp32 device. I can compile and upload without problem.
After the device is configured with the captive web portal, it requests a security token from the SignalK server. When that is provided, I get an error as shown below and it goes back to requesting another security token.
D) (test_token)(C1) End of payload output
(D) (test_token)(C1) Attempting to connect to SignalK Websocket...
    [E][WiFiClient.cpp:482] flush(): fail on fd 58, errno: 11, "No more processes"
(W) (on_disconnected)(C1) Bad access token detected. Setting token to null.
    (D) (connect)(C1) Initiating connection
(D) (connect)(C1) Websocket is connecting to SignalK server on address 192.168.0.1:3000
(D) (connect)(C1) No prior authorization token present.
(D) (send_access_request)(C1) Preparing a new access request
ba58smith commented 3 years ago

What are the few lines BEFORE the section of the Serial Output that you copied here? Specifically, I'm looking for the line that contains the phrase ""Testing resulted in http status".

tedenda commented 3 years ago

From Slack/sensors: Featheresp32_error.txt

JohnySeven commented 3 years ago

The issue in here is that code is checking HTTP result and if the result isn't 200 OK it will clear the token - this is wrong of course, because server could be down, there could be issue with memory, etc. No reason to clear the token. In my branch this is fixed. I'll put SK watch project on hold for now and I'll switch to SensESP to finish what I've started. Please be patient with me :-) Thanks guys!

RBerliner commented 3 years ago

I posted the additional debug info on Slack but maybe I should have posted it here instead. You guys really jumped on this - thanks!

Featheresp32_error.txt

ba58smith commented 3 years ago

@JohnySeven - if the result is not 200, and it clears the token, then it just makes a new one and sends another request, right? If that happened because of a momentary network problem, then the new access request will be approved, and everything goes ahead as it should, right? I agree that that's not the right way to handle it, because now you have to approve another access request. However, will that cause the specific problem that @RBerliner is reporting? I've had a lot of network issues from time to time, and I don't recall ever seeing his problem before.

JohnySeven commented 3 years ago

@ba58smith new goodies that I have in my branch will help @RBerliner.

I've added new mechanism that will not clear the token if:

Also the version in my branch doesn't restart ESP on WiFi loss. It prints our WiFi connection loss reason and tries to reconnect back without restarting - this is big change and needs to be tested propertly. So far it survived disabling the Wifi AP, placing firewall in the way and other nasty things I've come up with.

@ba58smith or @tedenda are both welcome to join the ride (see below) :-)

@RBerliner could you please run one more test, but with my branch instead of current SensESP?

Just comment out SensESP in lib_deps and add "https://github.com/JohnySeven/SensESP#networking_changes" in your platformio.ini file and post the log here please so I can make sure that mechanism is working correctly.

Thanks!

RBerliner commented 3 years ago

@JohnySeven: Thank you for the help. I modified platformio.ini as you requested, deleted SensESP from .pio/libdeps/featheresp32, did a Clean then a Build, then an Upload and Monitor. I do not get the same error but it seems to hang on connect. I attach the Monitor output.

Featheresp32_error_2.txt

tedenda commented 3 years ago

I can use your code for some of my test sensors and do some tests. @JohnySeven what is your name on Slack?

JohnySeven commented 3 years ago

@tedenda my name on Slack is Jan Dytrych :-)

tedenda commented 3 years ago

@JohnySeven have you seen this discussion that is going on right now about this issue? https://signalk-dev.slack.com/archives/C9B4Y02EN/p1604340599151900

tedenda commented 3 years ago

@JohnySeven I tried to use your code but I get errors, is that fork up to date with master? Build error gist

JohnySeven commented 3 years ago

@tedenda branch is 4 commits ahead, 15 commits behind SignalK:master.

tedenda commented 3 years ago

I don't know what is failing, I have deleted .pio folder, so all dependencies is downloaded fresh.

RBerliner commented 3 years ago

The discussion of this connect/reconnect problem is being carried on here and on Slack. If you are interested in this problem, it would be worthwhile to take a look at

https://signalk-dev.slack.com/archives/C9B4Y02EN/p1604340599151900

Due to suggestions from @Butch, I think some things have been learned about this problem.

kreucher commented 3 years ago

also bitten by this... been approving repeated access requests as a workaround. unable to see the slack link. how can i help?

joelkoz commented 3 years ago

I found this yesterday also. I happened to be looking at my log when it happened. It seems to be related to intermittent network issues. I think SensESP gives up on its token too soon. Here it is the log for reference:

things going good here...

␛[0m(D) (get_delta) SKDelta::get_delta: {"updates":[{"source":{"label":"sk-engine-lights"},"values":[{"path":"electrical.switches.lights.engineroom.state",␛[0m"value":false}]}]}
␛[0m(D) (get_delta) SKDelta::get_delta: {"updates":[{"source":{"label":"sk-engine-lights"},"values":[{"path":"electrical.switches.lights.engineroom.state",␛[0m"value":false}]}]}

suddenly loses network connection, then regains:

␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (test_token) Testing token with url http://192.168.0.147:80/signalk/v1/api/
␛[0m(D) (test_token) Testing resulted in http status 200
␛[0m(D) (test_token) Returned payload (length 5368) is: 
␛[0m(D) (test_token) {"vessels":{"urn:mrn:imo:mmsi:367560820":{"mmsi":"367560820","name":"eSea Street","navigation":{"position":{"meta":{"description":"The␛[0m position of the vessel in 2 or 3 dimensions (WGS84 datum)","properties":{"longitude":

<more SignalK data>

{"label":"ws.1bbe1d64-a6ca-4464-8c22-6d3f05b5197c","timestamp":"2020-12-08T22:05:42.212Z"}}}
␛[0m(D) (test_token) End of payload output
␛[0m(D) (test_token) Attempting to connect to Signal K Websocket...

Here is where it loses the token:

␛[0m(W) (on_disconnected) Bad access token detected. Setting token to null.
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (connect) No prior authorization token present.
␛[0m(D) (send_access_request) Preparing a new access request
␛[0m(W) (send_access_request) Can't handle response -1 to access request.
␛[0m(D) (send_access_request) 
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (connect) No prior authorization token present.
␛[0m(D) (send_access_request) Preparing a new access request
␛[0m(D) (send_access_request) Polling /signalk/v1/requests/5cfa8dde-8b10-48ff-81de-d1cda0744624 in 5 seconds
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(D) (poll_access_request) PENDING
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(W) (poll_access_request) Can't handle response -1 to pending access request.
␛[0m(W) 
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(W) (poll_access_request) Can't handle response -1 to pending access request.
␛[0m(W) 
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(D) (poll_access_request) PENDING
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(W) (poll_access_request) Can't handle response -1 to pending access request.
␛[0m(W) 
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(D) (poll_access_request) PENDING
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(W) (poll_access_request) Can't handle response -11 to pending access request.
␛[0m(W) 
␛[0m(D) (connect) Initiating connection
␛[0m(D) (connect) Websocket is connecting to Signal K server on address 192.168.0.147:80
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(D) (poll_access_request) PENDING
␛[0m(D) (poll_access_request) Polling SK Server for authentication token
␛[0m(D) (poll_access_request) PENDING
␛[0m(D) (poll_access_request) Polling SK Server for authentication token

and now its waiting for authorization.

joelkoz commented 3 years ago

I see @JohnySeven has actually fixed this back in Oct. Is there a reason to not do a PR for this problem only?

ba58smith commented 3 years ago

also bitten by this... been approving repeated access requests as a workaround. unable to see the slack link. how can i help?

@kreucher , that Slack conversation was some time ago, and has "fallen off". (Our free version of Slack holds onto things for only so long before disappearing.)