Openvario / sensord

Daemon to poll air data from Openvario sensorboard
6 stars 11 forks source link

fix disconnect-reconnect #1

Closed bomilkar closed 4 years ago

bomilkar commented 4 years ago

mainly uninitialized variable sock_err in NMEA_message_handler leads to unpredictable return value.

initialize struct sockaddr for connect()

make the connection O_NONBLOCK because we don't want to disturb the timing of the measure cycle

fixed sensordata_from_file option.

linuxianer99 commented 4 years ago

I also installed 20115 in 7" PQ platform. So what is the problem with sensord and reconnects ??

If i configure xcsoar to directly connect 4353 (bypass variod), there is no problem with reconnects ...

bomilkar commented 4 years ago

So what is the problem with sensord and reconnects ??

The problem is described at length here: https://github.com/Openvario/variod/pull/9 iglesiasmarianod reports these "Same disconnections occur" all over. You can see it yourself: XCSoar reports "Connection reset by peer" in console where it starts.

bomilkar commented 4 years ago

Please split up the commits in logic functions.

I'll revert the commit and post 2 new ones. Is that OK for you?

bomilkar commented 4 years ago

@linuxianer99 I reverted the initial commit and posted 2 separate commits. Is that OK?

iglesiasmarianod commented 4 years ago

First of all, @linuxianer99 thanks for writing Sensord! Thought it was Andreas but Mihu told me it is yours. Kudos! To make it simpler to explain I've made two videos yesterday. The image is compiled after last merge of @bomilkar changes to variod. In the first video I start XCSoar and try to mute/unmute. You hear me hiting enter several times with no changes. In the second video, I kill variod and connect XCSoar to port 4553. You see debug flicker fast. I quit and go to XCSoar log and you can see the constant disconnections from Port 4553 (even a disconnected by peer). Here I post three images from this run showing variod.log and sensord.log reconnecting constantly.

Variod_log1 Variod_log2 Sensord

Digging a little more with logs in Variod found out that while(read_size=recv()>0) is exiting because read_size is 0. Not an error, but it resets both sockets and reconnects XCSoar, unmutes and tries to read a message from sensord again. Thus mute is never working due to the reconections. If you pay attention in video 2, vario sound does not mute after XCSoar quits. After moving the connection to XCSoar outside the while(1) loop in variod STF and Vario Mode (command reading from xcsoar) started to work but this is not the cause of my problem. Just fixes the symptom. That's the problem I have and what I traced from logging.

Sound not muting: https://www.youtube.com/watch?v=z4jD6pP8zwM

Sensord disconnecting: https://www.youtube.com/watch?v=2cHy0U0XRxQ

bomilkar commented 4 years ago

@iglesiasmarianod I think the issue comes from sensord closing the socket, and reopening again.

// main data acquisition loop
                while(sock_err >= 0)
                {       int result;

                        result = usleep(12500);
                        if (result != 0)
                        {
                                printf("usleep error\n");
                                usleep(12500);
                        }
                        pressure_measurement_handler();
                        sock_err = NMEA_message_handler(sock);

                } // while(1)

                // connection dropped
                close(sock);

sock_erris the return value from NMEA_message_handler(sock) which is uninitialized inside NMEA_message_handler(). Initializing it to 0 solves the problem. (Initializing it to -1 forces the problem to happen.)

linuxianer99 commented 4 years ago

Ok. so seems we have also a problem in sensord.

Please open an issue in sensord repo for this problem, where we can track the errors. In my mind it is very difficuult to trace all the changes ... Even the release notes are created using issues.

bomilkar commented 4 years ago

OK, I close it (since I made the mess in the first place). Later today I will create 2 issues (not describing in full detail) and create 2 new PRs. The details of the 2 issues will come as we discuss. Is this OK?