Closed iglesiasmarianod closed 4 years ago
@iglesiasmarianod can you find the commit that introduced the regression?
Hi @kedder, yes sure. I'll start with the last release (0.3.1) and test from there.
Hi @kedder, variod starts failing after this two commits, the first needs files introduced in the second.
5d294f63f1c7196b86feb39197eee2b68b06d857 a2ed08fb98b33bb27b95096ff91b2041ed8cd95e
variod starts failing after this two commits, the first needs files introduced in the second.
5d294f6 a2ed08f
@bomilkar, do you have an idea how these commits could have caused this issue?
The filename NMEAchecksum was changed to nmea_checksum later among other changes. I don't know what happens if you revert commits which have been overridden later on. @kedder you probably recall: you suggested to follow naming conventions and coding style. I'd like to see the status of all files, namely variod.c
Is STF sound failing, or is the sound failing altogether?
I didn't revert commits, I started applying them in the order they were applied in the repo history from the last working version. Sound detaches from vario needle when applying those commits.
I believe set_audio_val(sensors.e);
is not getting (sensors.e);
or not getting the value correctly,
parse_NMEA_sensor()
gets these values and is the change introduced by these commits.
If it only affects STF sound then it might be the polar coming from XCSoar. It seems as if stf.c is expecting the polar in units of km/h (for the horizontal speed). XCSoar uses m/s. Please try to apply the attached patch which converts from m/s to km/h
When you compare the default values in stf.h with what you seen in /variod.log when changing the polar, then the units are definitely incompatible. Set the polar in XCSoar to ASW-24 and you get pretty close to these values:
Please read the subject of this issue. Stf is another issue opened. Please don't mix issues. The problem is to do with vario sound, not STF. Andrey just mentioned it because it might be ralated.
Just tried this quick and dirty:
static char buffer[2001];
const char delimiter[]=",*";
char *ptr=NULL;
unsigned int len = strlen(message);
//if (len >= sizeof(buffer)) return; // sentence longer than expected
//if (!verify_nmea_checksum(message)) return; // checksum error
// now it's checksum clean we don't want checksum to be mistaken as value
1) Increased buffer from 100 to 2001. The suspected commit reduces it from 2001 to 100. 2) Commented both lines checking string length and checksum that were added by this commit.
Vario Sound works again. What's left to find out is if the buffer is too small or the checksum verification is failing. Too late here to go on. I'll take a look tomorrow night.
Both parse_NMEA_sensor()
and parse_NMEA_command()
expect one sentence at a time, as it is clearly stated. If strlen(message) is more than 100, it's a clear indication that it is more than one sentence.
Maybe you should look at variod.c to see if the parse functions are really called one sentence at a time. You probably removed commit https://github.com/Openvario/variod/pull/10/commits/985bd69415912af302f854facdafdaefef27b6c2
I can reproduce the problem. I think the vario data doesn't propagate to the "sound machine". I see the vario needle moving, which means the data comes from sensord to variod and is passed on to XCSoar. variod parses the NMEA sentences correctly. But the sound doesn't change. I don't understand how the audio is being synthesized and how the vario value is supposed to be delivered to the sound synthesizer.
I've done a "repo sync" on Sunday or Saturday. No changes over this. The image has the number 20131.
Think there is also a mistake with strlen()
in nmea_parser.c line 34:
unsigned int len = strlen(message);
if (len >= sizeof(buffer)) return; // sentence longer than expected
if (!verify_nmea_checksum(message)) return; // checksum error
// now it's checksum clean we don't want checksum to be mistaken as value
nmea_chop_checksum(message);
// copy string and initialize strtok function
strncpy(buffer, message, len);`
strlen(message) just returns the lenght of message without "\n". So termination character is not copied by strncpy(buffer, message, len);
Later on (line 50):
if (*ptr == '\n' || *ptr == '\r') return; // end of sentence
the termination character is checked ...
@bomilkar, @kedder : What do you think ?
Hm, I think strlen
should count all chars until the \0
terminator, including \n
, if I'm reading this right: https://en.cppreference.com/w/c/string/byte/strlen.
Hmm ... This here says no termination character included: https://www.tutorialspoint.com/c_standard_library/c_function_strlen.htm
I believe \n
is not considered a "termination character". In C strings are null-terminated.
Yes, it's a mistake. It should say:
unsigned int len = strlen(message) + 1;
if (len > sizeof(buffer)) return; // sentence longer than expected
Same mistake in line 82. This will take care of strncpy(). But this isn't a problem since chopping off the checksum reduced the actual length by 3 or 4 chars.
It should never hit line 50 because ptr will be NULL if it hits the end of the chopped sentence.
ptr = strtok(NULL, delimiter);
BTW: have you figured out how the actual data (IAS and vario) are passed on to the sound synthesizer? I believe this might have been lost in all the changes. And that's the reason for "sound not responding to vario changes".
@bomilkar I don't think this has anything to do with passing data it to sound synthesizer. At least there is nothing about it in a commit that introduces the regression (5d294f63f1c7196b86feb39197eee2b68b06d857).
You should integrate https://github.com/Openvario/variod/pull/13
with this https://github.com/Openvario/variod/pull/13/commits/955541622e5767c4a6d7c8b2d0a860faa22d8cec
The point of function multi_sentence_clean()
is to make each sentence '\0' terminated. No '\n', no '\r' or multiples of it. Each sentence has the same form when it's handed to the parser function.
@kedder , this https://github.com/Openvario/variod/commit/5d294f63f1c7196b86feb39197eee2b68b06d857 has been overwritten by https://github.com/Openvario/variod/pull/4/commits/b80e9f4c41f690a33e54ee89715c2569110e6c12 and this https://github.com/Openvario/variod/pull/4/commits/bf5675d2767e59e2d78c3350dd83ff0ea89531ae
Either you remove all 3 commits or none of them.
I believe the problem is parse_NMEA_sensor() returning before passing the value to sensor.e.
It has nothing to do with the synthesizer.
I think parse_NMEA_sensor()
exits here:
if (len >= sizeof(buffer)) return; // sentence longer than expected
if (!verify_nmea_checksum(message)) return; // checksum error
I'll log len
and verify_nmea_checksum()
to confirm.
I'm watching the values coming from the parser at variod.c line 341:
//get the TE value from the message
ias = getIAS(sensors.q);
printf ("IAS: %f, TE: %f\n",ias,sensors.e);
This looks reasonable: sensors.q and sensors.e are updated from the parser. And these values are passed to set_audio_val() . Hmmm ....
I think I have a trace: Looking at the logfile /variod.log I'm missing these 2 lines
Connecting to pulseaudio...
Connection to pulseaudio established.
This lines used to be there. Why are they missing?
And when I run it on Debian (with sensord_mock) it all runs fine. Has something changed in the OS infrastructure recently?
Checked the error.
len is 2000, buffer is 100, thus exiting there.
Didn't want to mention this here because it might be the subject of another issue, but the same buffer reduction was applied to parse_NMEA_command and this isn't working either.
I propose increasing the buffer size to the value it had before the commit.
I think solves the if (len >= sizeof(buffer)) return; // sentence longer than expected
problem.
If a reduction of this buffer is desired I think we should see why we still get 2000 bytes. Anyway, I would say this is an enhancement. I would solve the problem first and then improve.
If it has to do with #13, I would include this reduction there.
After increasing buffer to 2001 if (!verify_nmea_checksum(message)) return; // checksum error
fails.
So, both checks are failing.
Next task is finding out why checksum is not working.
Here is another point which supports the assumption, that the OS infrastructure might be the fault.
I loaded the 20115 image from https://github.com/Openvario/meta-openvario/releases
On this platform any recent version of variod works and produces the sound as expected. Next I will look into audiovario.c to see how the connection with pulseaudio is established. Does anybody have a prototype on how to connect to pulseaudio? Just as a reference.
I did a "repo syn" this morning to make sure I get the latest. And double checked that variod-testing_git.bb
has no local patches.
From that I copied the variod executable to /opt/bin of 20115.
And the sound is just fine!
So what has changed since 20115? I'm lead to believe that the pulseaudio daemon is working right on newer images. How can we narrow this down?
Just an other observation: Looking at the Ethernet socket of Cubie board I see the amber light flashing as soon as sound is active. I don't understand why sound creates traffic on the network. Shouldn't it stay local?
Just one comment to "repo sync": This pulls only the layers of the openembedded build system. Not the repos from the recipes which are built during "bitbake openvario-image" ! The recipes pull their source any time the recipe is build.
This means: If you want to build an bleeding edge image, the way should be:
Btw. Image 20115 has variod-testing in the same version as variod-0.3.1 Maybe we include the git tag in the version option from variod to avoid confusion ...
So what has changed since 20115?
@bomilkar, 5d294f6 is what changed since 20115. Before that commit sound worked, after that commit sound stopped working properly (according to @iglesiasmarianod, didn't check myself). This is a clear indication that that your commit contains a regression. It is clearly has nothing to do with OS infrastructure, sound synthesis or ethernet socket. The bug is introduced in these few changed lines that are visible if you click on this link: 5d294f6.
I'm pretty sure this can be fixed easily, there is not that much that was changed in that commit. If we can't figure it out though, I guess what is left is to revert the whole #4 that brought that commit, but that would be a pity.
@kedder
I was following a wrong path: pulseaudio does work, also in what I have as the latest master. I take back what I speculated about OS infrastructure.
please look at https://github.com/Openvario/variod/commit/5d294f63f1c7196b86feb39197eee2b68b06d857 this is stale. For instance the file NMEAchecksum.h doesn't even exist anymore. It's been renamed and the indentation style was changed. Remember? You can't pick and choose commits in a single PR. All I can do is to close #4 and create a new PR. But this new one is probably #13 (see next item).
in the meantime I have working: what did the trick is in this commit: https://github.com/Openvario/variod/commit/dbbd4c73caed789065675af393cda65d5f2b2f9e . It is the 2nd commit in https://github.com/Openvario/variod/pull/13 . There was a confusion on the termination of the sentences: XCSoar wants a '\n' but no '\0', parseNMEA...() wants a '\0' but not '\n'.
@iglesiasmarianod : if parse_NMEA_sensor() needs 2000+ chars as buffer, this is a clear indication that it gets more than one NMEA sentence. This will not work. It expects one sentence at a time. And they are never longer than 100 chars. And verify_nmea_checksum() will fail if the sentence has a '\n' at the end. The checksum field must end immediately before the '\0'. Look at the code line 44: if (endptr == checksum_string || *endptr != 0 || read_checsum2 >= 0x100)
@linuxianer99 How can I "un-merge" https://github.com/Openvario/variod/pull/4 and merge #13 ? I want to test the result locally. They both are based on master and try to achieve the same thing.
@bomilkar please make a new PR that fixes this particular issue, I suggest not merging #13 before we fix this, because it is a completely different issue, it is substantially large and can introduce other bugs, that could be difficult to attribute.
please look at 5d294f6 this is stale. For instance the file NMEAchecksum.h doesn't even exist anymore. It's been renamed and the indentation style was changed. Remember?
There is no stale commits. This commit is a part of the git history. Applying 5d294f6 and a2ed08f on known-good version introduces the bug described here.
You can't pick and choose commits in a single PR.
Yes you can. This is why commits need to be self-contained and well scoped. I do not propose to revert this particular commit here though. I propose to leave it as it is, but use it to spot the problem (since it was introduces by it) and fix it with another PR.
@bomilkar these are my thoughts about the points you mention:
1) Pulseaudio problem: XCSoar sound gets to the speaker OK, vario sound gets to the speaker OK. You can hear sound coming from XCSoar over variod output. It seems pulseadio is not the problem.
2) What changed since 20115 (apr 24th image): sensord changed. This issue became evident after we fixed sensord disconnections. If I go back to the disconnecting sensord, sound seems to work. Don't know how the disconnections affected variod sound.
3) Why don't we record a sensor reading sesion and start working with actual sensord using recorded sensor input? I mean, instead of emulating the deamon, emulating the sensors and testing both, variod and sensord, at the same time.
4) Commenting out the checks and increasing the buffer makes sound work again. That action is equivalent to reverting this particular patch. Remember I first tested with ALL commits applied and it still fails. Including commits one by one, in the order the repo lists them, led me to find the one that fails first.
@kedder I can't reproduce the problem. Here is what I did:
The above is all on my debian. Does this look OK? Is this what you did to see the issues?
@iglesiasmarianod
Why don't we record a sensor reading sesion and start working with actual sensord using recorded sensor input?
YES PLEASE! I've been asking this for a while. Ideally we should be able to test it on a debian desktop machine before we test it on the target.
@linuxianer99 This means: If you want to build an bleeding edge image, the way should be:
bitbake variod-testing -c cleanall
bitbake variod-testing
bitbake openvario-image-testing
Understand, but how can I go back after https://github.com/Openvario/variod/pull/4 has merged? That should be identical to the status of my branch fix-variod-XCSoar-protocol . Right?
@kedder , @bomilkar @iglesiasmarianod :
As variod version 0.3.1 is working well, i will go back to this commit with the master branch. I will save the actual code in a different branch.
After we have tested 0.3.1 and came to a positive result, we can isolate the different features and merge them step by step after every feature is tested well.
We should avoid a lot of changes for different features at the same time in the future.
All agree ?
I Agree.
IMO revert back to 0.3.1 is a bit of a brutal measure :) I still believe this can be fixed easily if reproduced. After all, @iglesiasmarianod suggested the solution that works (restore the size of the buffer
back to 2000), we can just apply it.
@bomilkar, do you think that would make sense?
I agree it is a brutal measure, but i think we should have highest priority on stable software instead of quick features. Specially variod has lot of changes on different functions/bugs which are difficult to trace ... I suggest to do smaller commits for each function/fix. This make things more easy to review. Also i suggest to do more testing before we merge things in the main branch.
By going back to 0.3.1 we do not loose the code, but we can cut the changes in smaller pieces and review and test them in a more structured way ...
For me the biggest issue is that I can't reproduce what you are seeing. So finding a common base from where to start is key. We can quickly introduce the features on that new start point because by now, we know quite well which features are needed. An other key point in my mind: more testing. That includes testbench, testvectors, and if at all possible testing on a workstation not on the target. The code must be portable maybe except small parts around the sensors. I know Coverity tools used to be integrated with github. Is this still the case? If yes, we should use them and make it a requirement for a PR to be "Coverity clean". It might be quite a stretch to make the new starting point "Coverity clean", but it would have caught a lot of things like "read before write". Communication is another issue: email would be much easier to discuss specific sub-topics. I understand why it's important to bring everybody on the same page, but there are too many parallel thoughts in this Issue here to use a simple linear tool. I agree with whatever you guys decide.
Okay, I'm fine with resetting the branch then. And I agree, we need to invest some time in unit testing and testing tools.
Okay. master is at version 0.3.1 The original code is in master-backup.
@bomilkar, @kedder : Maybe we can make small and clear commits to get function by function back to the master branch. Suggest we do tests for every new feature to be sure, that all is working fine.
@linuxianer99 : Can't we use a solid linting tool (like Coverity integrated in github) and clean the existing code base before any functional changes? variod's code base is very small. I'm sure we can clean it up in a few days. All new code must be clean before it's merged. That should eliminate some of the issues we've seen.
@kedder : do you want to write a Python testbench?
do you want to write a Python testbench?
I can try to. Shouldn't be terribly difficult.
Vario sound stays beeping at a fixed lift value while vario values change. Warrior image compiled including all latest commits.