Openvario / variod

Daemon for autonomous e-vario
4 stars 8 forks source link

Fix Mute/Unmute problem #9

Closed iglesiasmarianod closed 4 years ago

iglesiasmarianod commented 4 years ago

Moved the enabling of the vario sound to the connection function. It only enables sound on connect or reconnect. Before the change sound was unmuted constantly. It was inside While(1) infinite loop. Please test before merging. Works in my OV 5,7.

bomilkar commented 4 years ago

Is recv() guaranteed to return complete lines?? I mean can we rely on the assumption that we get no incomplete NMEA sentence from recv() ??

What if recv() returns an incomplete NMEA sentence at the end, with the remaining part of the sentence from the next call? I can see that this can cause some glitches if it actually happens.

I'll see if it happens and how it can be handled gracefully.

bomilkar commented 4 years ago

OK @iglesiasmarianod , I found the reason why your "curing the symptom" worked. Look at this line: connfd = accept(listenfd, (struct sockaddr *)&s_xcsoar, (socklen_t*)&c);

It should be this: connfd = accept(listenfd, (struct sockaddr *)&server, (socklen_t*)&c);

Right?

s_xcsoaris uninitialized at the time of calling accept(). Hence accept() may fail.

I think you can close this PR.

I'm in the process of tracking down the issue related to recv() and will probably place a new PR later today.

iglesiasmarianod commented 4 years ago

I logged recv(). The problem I have is getting 0 bytes. This resets the socket and unmutes vario. To see if the problem was variod I killed it and connected sensord to XCSoar directly. Same disconnections occur. recv() returning 0 means disconnected by peer if I'm not mistaken. Think the problem lies in sensord. XCSoar logs 4553 end of file error.

iglesiasmarianod commented 4 years ago

Here's my variod.log As you can see it is reconnecting all the time to sensord.

variod.log

Checked also partial messages, from the 60 and so bytes to send I've seen them sent in two or three chunks with no apparent problem. Between those chunks or the complete byte send I get 3 or 4 zeros, a reconnect, a send, and so on.

iglesiasmarianod commented 4 years ago

And here's my log with output of recv() inside and outside the loop.

variod.log

You can see partial receives, but think are not the cause of disconnection. It seems to be a pattern, the number of scans without data is always the same. Sensord goes to sleep 12500 microsecond each read. Think we have to look there.

iglesiasmarianod commented 4 years ago

OK @iglesiasmarianod , I found the reason why your "curing the symptom" worked. Look at this line: connfd = accept(listenfd, (struct sockaddr *)&s_xcsoar, (socklen_t*)&c);

It should be this: connfd = accept(listenfd, (struct sockaddr *)&server, (socklen_t*)&c);

Right?

s_xcsoaris uninitialized at the time of calling accept(). Hence accept() may fail.

I think you can close this PR.

I'm in the process of tracking down the issue related to recv() and will probably place a new PR later today.

Think accept() initializes s_xcsoar, not the other way around.

int accept(int socket, struct sockaddr address, int address_len);

address The socket address of the connecting client that is filled in by accept() before it returns. The format of address is determined by the domain that the client resides in. This parameter can be NULL if the caller is not interested in the client address.

Extracted from:

https://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxbd00/accept.htm

iglesiasmarianod commented 4 years ago

Tested several times what I'm going to post because it is having me scratching my head for days. I'm trying to debug sensord which is the real cause of all my variod related problems. This is the main sensord loop that was crashing and closing connection from its side thus sending a 0 to variod wich caused constant reconnections in my device.

        while(sock_err >= 0)
        {   int result;

            result = usleep(12500);
            if (result != 0)
            {
                printf("usleep error\n");
                usleep(12500);
            }
            pressure_measurement_handler();
            sock_err = NMEA_message_handler(sock);
            debug_print("Sock_err: %d\n",sock_err);

        } // while(1)

        // connection dropped
        close(sock);

With the last merged commit of variod untouched, the one @bomilkar fixed the memory leak, and only adding a debug line to sensord all the problems went away. Sensord does not close the socket anymore. Variod does not disconnect anymore, mute works, and sensord to XCSoar connection (without variod) is stable too. The strange thing is that this line is not executed because I didn't use -d1 command argument. The only thing it does is delaying the loop 10ms aprox even without being executed. I knew that sensord needed strict timing to manage all sensors read and measure in a timely manner and that microseconds here may crash everything. But this seems too odd. Tried it several times just to be sure this is the only thing that changed. Did not want to mention this earlier because I have absolutely no clue why this changes the result. Just putting something there to see how this was failing makes it stable so I can't see where it fails. @kedder any thoughts? Is it possible to contact Andreas to see if he can shed some light on this?

bomilkar commented 4 years ago

I guess the effect of usleep(12500); is that the receiving end (variod) will get one sentence per recv(). If the sentences come faster, chances are several lines will come from recv(). variod was not able to handle that until the changes in https://github.com/Openvario/variod/pull/10 . If recv() returns multiple NMEA sentences, they are separated by '\0' chars. No way the previous version of variod.c could have handled that! As long as sensord spits out the sentences with enough delay in between, recv() will come back with one sentence only. And that worked (aside from the memory leak, the parsing issues, etc.).

iglesiasmarianod commented 4 years ago

if you get a 0 inside the while(recv()) you'll never get the cahnce to parse the sentence. Sensord sends things in packages of 3 POVS. Can`t send faster than that. Plese let's try not to change things based on a simulation and let's stick to actual hardware testing.

bomilkar commented 4 years ago

I'm testing and debugging on debian. When I get to a stable point I move to the OV hardware (w/o sensors, though). If it's clean on debian it's clean on OV, too.

iglesiasmarianod commented 4 years ago

Well, variod is only useful with sensors. If you don't have sensors you can't test with actual sensord. And that's the problem we are having now. Your bot behaves differently than actual sensord. So,please don't assume a problem is solved before testing with sensors. You weren't able to replicate my mute problem with the bot yet.

bomilkar commented 4 years ago

Well, variod is only useful with sensors.

That's not true. variod has to interact with XCSoar to provide data and to receive instructions to enable STF. The latter part is now fixed. It never worked. I will look into sensord to see if I can spot something, or make my bot a bit more behaving like sensord. But I won't be able to verify what I've done wrt sensord.

linuxianer99 commented 4 years ago

variod is only useful with sensors ! variod is just another implementation for a evariod system, because the one XCSoar provides is/was not optimal.

bomilkar commented 4 years ago

I meant variod needs information from XCSoar to perform STF. That is McCready and real polar (or ideal polar, load factor, polar degadation). How can variod perform STF w/o knowing this?

bomilkar commented 4 years ago

I found the issue, @iglesiasmarianod You have to initialize sock_err=0 at the beginning of NMEA_message_handler() Otherwise you get undefined return value which throw you out of the while loop in main.

I have spotted a couple of other issues, which may or may not be severe. I'll do a PR with it if you want. Otherwise I assume you'll take it from here. Good luck!

bomilkar commented 4 years ago

@iglesiasmarianod , can you please chime in here: https://github.com/Openvario/sensord/pull/1#issuecomment-623598568

iglesiasmarianod commented 4 years ago

Hi guys, I'm a bit busy with work right now. I'll take a look ASAP.

On Mon, May 4, 2020, 5:26 PM Ronald Niederhagen notifications@github.com wrote:

@iglesiasmarianod https://github.com/iglesiasmarianod , can you please chime in here: Openvario/sensord#1 (comment) https://github.com/Openvario/sensord/pull/1#issuecomment-623598568

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Openvario/variod/pull/9#issuecomment-623686771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZLFH46EIMQNGJD2NG5H7LRP4QGBANCNFSM4MWFMOPQ .

linuxianer99 commented 4 years ago

@iglesiasmarianod @bomilkar @mihu-ov

Guys we need a bit more structure in this here.

I will write a few rules for contribute code, but for the moment:

We should discuss/document the analysis of errors in the issue ticket and not in the PR.

Let's start with the rules with this issue.... If it is ok for you, i will close this PR now ?!?!

iglesiasmarianod commented 4 years ago

You are right. It is perfect for me. Please close the PR

On Tue, May 5, 2020, 4:20 AM Timo Bruderek notifications@github.com wrote:

@iglesiasmarianod https://github.com/iglesiasmarianod @bomilkar https://github.com/bomilkar @mihu-ov https://github.com/mihu-ov

Guys we need a bit more structure in this here.

I will write a few rules for contribute code, but for the moment:

  • There should be an issue ticket for each problem.
  • Each PR has to have an associated issue
  • Each PR is just one logical function or fix. Not a combination of fixes and functions from different topics.
  • Every commit includes only one logical function or fix.
  • Make clear PRs without reverting commits or things like that. It makes that changes unclear and very difficult to review.

We should discuss/document the analysis of errors in the issue ticket and not in the PR.

Let's start with the rules with this issue.... If it is ok for you, i will close this PR now ?!?!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Openvario/variod/pull/9#issuecomment-623894669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZLFH4HQXC3RBHM666DFCDRP6425ANCNFSM4MWFMOPQ .

bomilkar commented 4 years ago

@linuxianer99 just a thought: why don't you create an issue "Ground rules for contributing". That captures what has been said and in the right place. And it gives the contributors an opportunity to ask and discuss.