Openvario / variod

Daemon for autonomous e-vario
4 stars 8 forks source link

Vario sound not responding to vario changes #15

Closed iglesiasmarianod closed 4 years ago

iglesiasmarianod commented 4 years ago

Vario sound stays beeping at a fixed lift value while vario values change. Warrior image compiled including all latest commits.

kedder commented 4 years ago

@iglesiasmarianod can you find the commit that introduced the regression?

iglesiasmarianod commented 4 years ago

Hi @kedder, yes sure. I'll start with the last release (0.3.1) and test from there.

iglesiasmarianod commented 4 years ago

Hi @kedder, variod starts failing after this two commits, the first needs files introduced in the second.

5d294f63f1c7196b86feb39197eee2b68b06d857 a2ed08fb98b33bb27b95096ff91b2041ed8cd95e

kedder commented 4 years ago

variod starts failing after this two commits, the first needs files introduced in the second.

5d294f6 a2ed08f

@bomilkar, do you have an idea how these commits could have caused this issue?

bomilkar commented 4 years ago

The filename NMEAchecksum was changed to nmea_checksum later among other changes. I don't know what happens if you revert commits which have been overridden later on. @kedder you probably recall: you suggested to follow naming conventions and coding style. I'd like to see the status of all files, namely variod.c

Is STF sound failing, or is the sound failing altogether?

iglesiasmarianod commented 4 years ago

I didn't revert commits, I started applying them in the order they were applied in the repo history from the last working version. Sound detaches from vario needle when applying those commits. I believe set_audio_val(sensors.e); is not getting (sensors.e); or not getting the value correctly, parse_NMEA_sensor() gets these values and is the change introduced by these commits.

bomilkar commented 4 years ago

If it only affects STF sound then it might be the polar coming from XCSoar. It seems as if stf.c is expecting the polar in units of km/h (for the horizontal speed). XCSoar uses m/s. Please try to apply the attached patch which converts from m/s to km/h

0001-polar-units-from-m-s-to-km-h.patch.txt

bomilkar commented 4 years ago

When you compare the default values in stf.h with what you seen in /variod.log when changing the polar, then the units are definitely incompatible. Set the polar in XCSoar to ASW-24 and you get pretty close to these values:

define POL_A 0.000164

define POL_B -0.025714

define POL_C 1.668750

iglesiasmarianod commented 4 years ago

Please read the subject of this issue. Stf is another issue opened. Please don't mix issues. The problem is to do with vario sound, not STF. Andrey just mentioned it because it might be ralated.

iglesiasmarianod commented 4 years ago

Just tried this quick and dirty:

    static char buffer[2001];
    const char delimiter[]=",*";
    char *ptr=NULL;

    unsigned int len = strlen(message);
    //if (len >= sizeof(buffer)) return; // sentence longer than expected

    //if (!verify_nmea_checksum(message)) return; // checksum error

    // now it's checksum clean we don't want checksum to be mistaken as value

1) Increased buffer from 100 to 2001. The suspected commit reduces it from 2001 to 100. 2) Commented both lines checking string length and checksum that were added by this commit.

Vario Sound works again. What's left to find out is if the buffer is too small or the checksum verification is failing. Too late here to go on. I'll take a look tomorrow night.

bomilkar commented 4 years ago

Both parse_NMEA_sensor() and parse_NMEA_command() expect one sentence at a time, as it is clearly stated. If strlen(message) is more than 100, it's a clear indication that it is more than one sentence.

Maybe you should look at variod.c to see if the parse functions are really called one sentence at a time. You probably removed commit https://github.com/Openvario/variod/pull/10/commits/985bd69415912af302f854facdafdaefef27b6c2

bomilkar commented 4 years ago

I can reproduce the problem. I think the vario data doesn't propagate to the "sound machine". I see the vario needle moving, which means the data comes from sensord to variod and is passed on to XCSoar. variod parses the NMEA sentences correctly. But the sound doesn't change. I don't understand how the audio is being synthesized and how the vario value is supposed to be delivered to the sound synthesizer.

I've done a "repo sync" on Sunday or Saturday. No changes over this. The image has the number 20131.

linuxianer99 commented 4 years ago

Think there is also a mistake with strlen() in nmea_parser.c line 34:

unsigned int len = strlen(message);
if (len >= sizeof(buffer)) return; // sentence longer than expected

if (!verify_nmea_checksum(message)) return; // checksum error

// now it's checksum clean we don't want checksum to be mistaken as value
nmea_chop_checksum(message);

// copy string and initialize strtok function
strncpy(buffer, message, len);`

strlen(message) just returns the lenght of message without "\n". So termination character is not copied by strncpy(buffer, message, len); Later on (line 50): if (*ptr == '\n' || *ptr == '\r') return; // end of sentence the termination character is checked ...

@bomilkar, @kedder : What do you think ?

kedder commented 4 years ago

Hm, I think strlen should count all chars until the \0 terminator, including \n, if I'm reading this right: https://en.cppreference.com/w/c/string/byte/strlen.

linuxianer99 commented 4 years ago

Hmm ... This here says no termination character included: https://www.tutorialspoint.com/c_standard_library/c_function_strlen.htm

kedder commented 4 years ago

I believe \n is not considered a "termination character". In C strings are null-terminated.

bomilkar commented 4 years ago

Yes, it's a mistake. It should say:

        unsigned int len = strlen(message) + 1;
        if (len > sizeof(buffer)) return; // sentence longer than expected

Same mistake in line 82. This will take care of strncpy(). But this isn't a problem since chopping off the checksum reduced the actual length by 3 or 4 chars.

It should never hit line 50 because ptr will be NULL if it hits the end of the chopped sentence. ptr = strtok(NULL, delimiter);

BTW: have you figured out how the actual data (IAS and vario) are passed on to the sound synthesizer? I believe this might have been lost in all the changes. And that's the reason for "sound not responding to vario changes".

kedder commented 4 years ago

@bomilkar I don't think this has anything to do with passing data it to sound synthesizer. At least there is nothing about it in a commit that introduces the regression (5d294f63f1c7196b86feb39197eee2b68b06d857).

bomilkar commented 4 years ago

You should integrate https://github.com/Openvario/variod/pull/13 with this https://github.com/Openvario/variod/pull/13/commits/955541622e5767c4a6d7c8b2d0a860faa22d8cec The point of function multi_sentence_clean() is to make each sentence '\0' terminated. No '\n', no '\r' or multiples of it. Each sentence has the same form when it's handed to the parser function.

bomilkar commented 4 years ago

@kedder , this https://github.com/Openvario/variod/commit/5d294f63f1c7196b86feb39197eee2b68b06d857 has been overwritten by https://github.com/Openvario/variod/pull/4/commits/b80e9f4c41f690a33e54ee89715c2569110e6c12 and this https://github.com/Openvario/variod/pull/4/commits/bf5675d2767e59e2d78c3350dd83ff0ea89531ae

Either you remove all 3 commits or none of them.

iglesiasmarianod commented 4 years ago

I believe the problem is parse_NMEA_sensor() returning before passing the value to sensor.e. It has nothing to do with the synthesizer. I think parse_NMEA_sensor() exits here:

if (len >= sizeof(buffer)) return; // sentence longer than expected

if (!verify_nmea_checksum(message)) return; // checksum error

I'll log len and verify_nmea_checksum() to confirm.

bomilkar commented 4 years ago

I'm watching the values coming from the parser at variod.c line 341:

                                //get the TE value from the message
                                ias = getIAS(sensors.q);
                                printf ("IAS: %f, TE: %f\n",ias,sensors.e);

This looks reasonable: sensors.q and sensors.e are updated from the parser. And these values are passed to set_audio_val() . Hmmm ....

bomilkar commented 4 years ago

I think I have a trace: Looking at the logfile /variod.log I'm missing these 2 lines

Connecting to pulseaudio...
Connection to pulseaudio established.

This lines used to be there. Why are they missing?

bomilkar commented 4 years ago

And when I run it on Debian (with sensord_mock) it all runs fine. Has something changed in the OS infrastructure recently?

iglesiasmarianod commented 4 years ago

Checked the error.

len is 2000, buffer is 100, thus exiting there.

Didn't want to mention this here because it might be the subject of another issue, but the same buffer reduction was applied to parse_NMEA_command and this isn't working either.

I propose increasing the buffer size to the value it had before the commit. I think solves the if (len >= sizeof(buffer)) return; // sentence longer than expected problem.

If a reduction of this buffer is desired I think we should see why we still get 2000 bytes. Anyway, I would say this is an enhancement. I would solve the problem first and then improve.

If it has to do with #13, I would include this reduction there.

iglesiasmarianod commented 4 years ago

After increasing buffer to 2001 if (!verify_nmea_checksum(message)) return; // checksum error fails. So, both checks are failing. Next task is finding out why checksum is not working.

bomilkar commented 4 years ago

Here is another point which supports the assumption, that the OS infrastructure might be the fault.

I loaded the 20115 image from https://github.com/Openvario/meta-openvario/releases

On this platform any recent version of variod works and produces the sound as expected. Next I will look into audiovario.c to see how the connection with pulseaudio is established. Does anybody have a prototype on how to connect to pulseaudio? Just as a reference.

bomilkar commented 4 years ago

I did a "repo syn" this morning to make sure I get the latest. And double checked that variod-testing_git.bb has no local patches. From that I copied the variod executable to /opt/bin of 20115. And the sound is just fine!

So what has changed since 20115? I'm lead to believe that the pulseaudio daemon is working right on newer images. How can we narrow this down?

bomilkar commented 4 years ago

Just an other observation: Looking at the Ethernet socket of Cubie board I see the amber light flashing as soon as sound is active. I don't understand why sound creates traffic on the network. Shouldn't it stay local?

linuxianer99 commented 4 years ago

Just one comment to "repo sync": This pulls only the layers of the openembedded build system. Not the repos from the recipes which are built during "bitbake openvario-image" ! The recipes pull their source any time the recipe is build.

This means: If you want to build an bleeding edge image, the way should be:

  1. bitbake variod-testing -c cleanall
  2. bitbake variod-testing
  3. bitbake openvario-image-testing

Btw. Image 20115 has variod-testing in the same version as variod-0.3.1 Maybe we include the git tag in the version option from variod to avoid confusion ...

kedder commented 4 years ago

So what has changed since 20115?

@bomilkar, 5d294f6 is what changed since 20115. Before that commit sound worked, after that commit sound stopped working properly (according to @iglesiasmarianod, didn't check myself). This is a clear indication that that your commit contains a regression. It is clearly has nothing to do with OS infrastructure, sound synthesis or ethernet socket. The bug is introduced in these few changed lines that are visible if you click on this link: 5d294f6.

I'm pretty sure this can be fixed easily, there is not that much that was changed in that commit. If we can't figure it out though, I guess what is left is to revert the whole #4 that brought that commit, but that would be a pity.

bomilkar commented 4 years ago

@kedder

  1. I was following a wrong path: pulseaudio does work, also in what I have as the latest master. I take back what I speculated about OS infrastructure.

  2. please look at https://github.com/Openvario/variod/commit/5d294f63f1c7196b86feb39197eee2b68b06d857 this is stale. For instance the file NMEAchecksum.h doesn't even exist anymore. It's been renamed and the indentation style was changed. Remember? You can't pick and choose commits in a single PR. All I can do is to close #4 and create a new PR. But this new one is probably #13 (see next item).

  3. in the meantime I have working: what did the trick is in this commit: https://github.com/Openvario/variod/commit/dbbd4c73caed789065675af393cda65d5f2b2f9e . It is the 2nd commit in https://github.com/Openvario/variod/pull/13 . There was a confusion on the termination of the sentences: XCSoar wants a '\n' but no '\0', parseNMEA...() wants a '\0' but not '\n'.

bomilkar commented 4 years ago

@iglesiasmarianod : if parse_NMEA_sensor() needs 2000+ chars as buffer, this is a clear indication that it gets more than one NMEA sentence. This will not work. It expects one sentence at a time. And they are never longer than 100 chars. And verify_nmea_checksum() will fail if the sentence has a '\n' at the end. The checksum field must end immediately before the '\0'. Look at the code line 44: if (endptr == checksum_string || *endptr != 0 || read_checsum2 >= 0x100)

bomilkar commented 4 years ago

@linuxianer99 How can I "un-merge" https://github.com/Openvario/variod/pull/4 and merge #13 ? I want to test the result locally. They both are based on master and try to achieve the same thing.

kedder commented 4 years ago

@bomilkar please make a new PR that fixes this particular issue, I suggest not merging #13 before we fix this, because it is a completely different issue, it is substantially large and can introduce other bugs, that could be difficult to attribute.

kedder commented 4 years ago

please look at 5d294f6 this is stale. For instance the file NMEAchecksum.h doesn't even exist anymore. It's been renamed and the indentation style was changed. Remember?

There is no stale commits. This commit is a part of the git history. Applying 5d294f6 and a2ed08f on known-good version introduces the bug described here.

You can't pick and choose commits in a single PR.

Yes you can. This is why commits need to be self-contained and well scoped. I do not propose to revert this particular commit here though. I propose to leave it as it is, but use it to spot the problem (since it was introduces by it) and fix it with another PR.

iglesiasmarianod commented 4 years ago

@bomilkar these are my thoughts about the points you mention:

1) Pulseaudio problem: XCSoar sound gets to the speaker OK, vario sound gets to the speaker OK. You can hear sound coming from XCSoar over variod output. It seems pulseadio is not the problem.

2) What changed since 20115 (apr 24th image): sensord changed. This issue became evident after we fixed sensord disconnections. If I go back to the disconnecting sensord, sound seems to work. Don't know how the disconnections affected variod sound.

3) Why don't we record a sensor reading sesion and start working with actual sensord using recorded sensor input? I mean, instead of emulating the deamon, emulating the sensors and testing both, variod and sensord, at the same time.

4) Commenting out the checks and increasing the buffer makes sound work again. That action is equivalent to reverting this particular patch. Remember I first tested with ALL commits applied and it still fails. Including commits one by one, in the order the repo lists them, led me to find the one that fails first.

bomilkar commented 4 years ago

@kedder I can't reproduce the problem. Here is what I did:

  1. in github I restored my branch fix-variod-XCSoar-protocol
  2. then locally: git clone git://github.com/bomilkar/variod cd variod/ git checkout fix-variod-XCSoar-protocol git pull make ./variod -f -d2 -c /opt/conf/variod.conf

The above is all on my debian. Does this look OK? Is this what you did to see the issues?

bomilkar commented 4 years ago

@iglesiasmarianod

Why don't we record a sensor reading sesion and start working with actual sensord using recorded sensor input?

YES PLEASE! I've been asking this for a while. Ideally we should be able to test it on a debian desktop machine before we test it on the target.

bomilkar commented 4 years ago

@linuxianer99 This means: If you want to build an bleeding edge image, the way should be:

bitbake variod-testing -c cleanall
bitbake variod-testing
bitbake openvario-image-testing

Understand, but how can I go back after https://github.com/Openvario/variod/pull/4 has merged? That should be identical to the status of my branch fix-variod-XCSoar-protocol . Right?

linuxianer99 commented 4 years ago

@kedder , @bomilkar @iglesiasmarianod :

As variod version 0.3.1 is working well, i will go back to this commit with the master branch. I will save the actual code in a different branch.

After we have tested 0.3.1 and came to a positive result, we can isolate the different features and merge them step by step after every feature is tested well.

We should avoid a lot of changes for different features at the same time in the future.

All agree ?

iglesiasmarianod commented 4 years ago

I Agree.

kedder commented 4 years ago

IMO revert back to 0.3.1 is a bit of a brutal measure :) I still believe this can be fixed easily if reproduced. After all, @iglesiasmarianod suggested the solution that works (restore the size of the buffer back to 2000), we can just apply it.

@bomilkar, do you think that would make sense?

linuxianer99 commented 4 years ago

I agree it is a brutal measure, but i think we should have highest priority on stable software instead of quick features. Specially variod has lot of changes on different functions/bugs which are difficult to trace ... I suggest to do smaller commits for each function/fix. This make things more easy to review. Also i suggest to do more testing before we merge things in the main branch.

linuxianer99 commented 4 years ago

By going back to 0.3.1 we do not loose the code, but we can cut the changes in smaller pieces and review and test them in a more structured way ...

bomilkar commented 4 years ago

For me the biggest issue is that I can't reproduce what you are seeing. So finding a common base from where to start is key. We can quickly introduce the features on that new start point because by now, we know quite well which features are needed. An other key point in my mind: more testing. That includes testbench, testvectors, and if at all possible testing on a workstation not on the target. The code must be portable maybe except small parts around the sensors. I know Coverity tools used to be integrated with github. Is this still the case? If yes, we should use them and make it a requirement for a PR to be "Coverity clean". It might be quite a stretch to make the new starting point "Coverity clean", but it would have caught a lot of things like "read before write". Communication is another issue: email would be much easier to discuss specific sub-topics. I understand why it's important to bring everybody on the same page, but there are too many parallel thoughts in this Issue here to use a simple linear tool. I agree with whatever you guys decide.

kedder commented 4 years ago

Okay, I'm fine with resetting the branch then. And I agree, we need to invest some time in unit testing and testing tools.

linuxianer99 commented 4 years ago

Okay. master is at version 0.3.1 The original code is in master-backup.

@bomilkar, @kedder : Maybe we can make small and clear commits to get function by function back to the master branch. Suggest we do tests for every new feature to be sure, that all is working fine.

bomilkar commented 4 years ago

@linuxianer99 : Can't we use a solid linting tool (like Coverity integrated in github) and clean the existing code base before any functional changes? variod's code base is very small. I'm sure we can clean it up in a few days. All new code must be clean before it's merged. That should eliminate some of the issues we've seen.

@kedder : do you want to write a Python testbench?

kedder commented 4 years ago

do you want to write a Python testbench?

I can try to. Shouldn't be terribly difficult.