AndyTaylorTweet / Pi-Star_v4_Binaries_Bin

Repository for Pi-Star Binaries
http://www.pistar.uk
24 stars 16 forks source link

mmdvmhost crashing 1min after started when dstar mode enabled #6

Closed narspt closed 3 years ago

narspt commented 3 years ago

Since your update today, mmdvmhost is crashing (11/SEGV) always 1min after started when dstar mode is enabled. If I enable just DMR and/or YSF modes apparently no problem, it seems to happen only if DStar mode is enabled (alone or along other modes also enabled). I have a DVMega Single Band hotspot, not sure if the issue is specific with my hotspot.

narspt commented 3 years ago

Probably same issue as this guy reports: https://forum.pistar.uk/viewtopic.php?f=3&t=3663

MW0MWZ commented 3 years ago

Thank you - I'm looking into this now...

MW0MWZ commented 3 years ago

So far, I am not able to replicate this - my D-Star hotspot runs fine, adding DMR to it, it's still good. My DMR hotspot is also fine, adding D-Star to that one is also working well.

I have mostly duplex boards here, I do have a DVMega here too tho, I'll fire that up in a sec and see what that does.

MW0MWZ commented 3 years ago

Found the issue - it is possible for the system to end up with "Type=Direct" in the MMDVMHost DMR Network section, when using DMRGateway it should be "Type=Gateway" instead - config page fixed up, I may also fix up the update process to deal with it too.

narspt commented 3 years ago

Sorry but the problem I reproduce must be something else, I'm not using DMRGateway... and as I said it happens only really when I enable DStar mode, then mmdvmhost always crashs exactly 1min after started (with 11/SEGV), then Pi-Star watchdog restarts it, then it crashs again after 1 min, again and again, non-stop...

Also did you read my post at https://forum.pistar.uk/viewtopic.php?p=18945#p18945 ?

I can't understand why my compiled mmdvmhost works fine and yours doesn't? can you share if you apply some specific changes on your build, Makefile, etc, and how do you compile it exactly? I did compile mine on Pi-Star itself on RPi, curiously my binary is 650KB after 'strip' (compiled with stock Makefile, without any display support enabled) and yours is 500KB with display(s) support enabled?

Did you test with the DVMega? Btw, I have a RPi 2+, not really sure if it makes any difference.

MW0MWZ commented 3 years ago

I have only just been able to reproduce the issue with D-Star, its quite specific... Only seems to happen with the DV-Mega and D-Star - its not happening on other configs... I add in the screen stuff and compile with optimise for size flags (not O2 or O3)

MW0MWZ commented 3 years ago

Looks like -Os was indeed a step too far for MMDVMHost, not sure why - it might be due to having to keep supporting 4.0 builds, I have to produce the bins against and old version of Raspbian, and I am sure that wont be helping. Revised binary released with -O3, that seems to work fine, auto-build process adjusted to -O3 for MMDVMHost.

narspt commented 3 years ago

Could you try to compile without appending "_Pi-Star_v4" on VERSION? I didn't confirm it but I'm suspecting it may be a buffer overflow caused by that, see this: https://github.com/g4klx/MMDVMHost/blob/9106fd69d2722e47bf2ecc5f91671364a8c68e48/DStarNetwork.cpp#L193 the string is even longer for DVMega... overflow at writePoll() buffer ?

Actually the buffer overflow may always happen but optimize for size flags make difference as things probably get more tight on memory and it may easily overwrite something it shouldn't?

narspt commented 3 years ago

I think it makes sense, "linux_mmdvm-dvmega-20210604_Pi-Star_v4" is 38 chars len, +6 = 44 and buffer len is 40 at writePoll()

MW0MWZ commented 3 years ago

I'd never even considered that this could be a problem...

narspt commented 3 years ago

I did find it while looking at code for something (timer) that would happen after/every 60 seconds, found that part of code, then seeing bigger string for dvmega and knowing you usually change version string... fired the alarm :)

Yet, concerning "optimize for size"... actually I'm not sure that is a good thing, as you know most hotspots run on lower cpu performance hardware like even RPiZero... not sure if it's a nice thing to eventually sacrifice performance to save a few KB's? apparently ~200KB on your mmdvmhost binaries. I understand you want to lower bw transfers, etc but it's already a very huge improvement by striping debug symbols, comparing with your old 7MB files...

And obviously, independently of optimize flags, I think you should (when you have time) look at a proper fix for that buffer overflow, even if it (apparently) doesn't cause problems with -O3, it's not surely nice and we never know what these 4 extra bytes may overwrite... you will need to think on how to fix it properly, surely you could just increase that buffer size but not sure if the extra bytes would then cause problems on other programs/servers that "poll" packet will be sent to... or you could maybe just change it to "_PiStar" and it would exactly fit the 40 bytes buffer... hehe

MW0MWZ commented 3 years ago

Its a great catch - so thanks for bringing it up - yep I am fixing it, just been through the build system to replace all the vanity stamps with "PS4" rather than "Pi-Star_v4" - it doesn't matter much what it is, it just helps me to know they are bins I made, or that a system is a "Pi-Star" system for the networks to know who to shout at if its doing something it should not.

Looking at what you found, "PS4" should fit in the current buffer size, I don't want to patch around the buffer length, because I don't think straying un-nessesarily from the public code is a good idea - it doesn't help Jonathan help me if something does go sideways - equally it doesn't help me... I see this very much as a two way street.

back to the flags, -Os should actually be a benefit on small systems, it takes longer at compile time but should save memory at run time, such is the theory. Cutting the size is also a benefit for sure, and yes stripping the bins was (I think) a good move, I don't think anyone was debugging against my versions.

Thanks again for pointing this out, you are dead right about the overflow being a silent killer, it could have been doing all kinds of things for all I know, or nothing of course.

narspt commented 3 years ago

ram is cheap and abundant nowadays even on small systems, modern smartphones have more ram than my desktop pc, hehe! well, seriously, Pi-Star on my RPi seems to use less than 100MB of ram, and most RPi's have 512MB or 1GB... then personally I think I would always prefer to bet on "optimize for speed" and make sure cpu resources usage is as low as possible, even if it means a bit more ram usage, but... after some research it seems that depending of cpu and code there are cases that -Os is actually similar or even faster than -O2 or -O3, then things may not be like they are supposed to be...

Anyway, all this was good so we find the hidden overflow :) and sure "PS4" will be safe.

narspt commented 3 years ago

Just to confirm: no issues at all with your new binaries released today. Thanks for your work ;)

MW0MWZ commented 3 years ago

There are potentially issues with D-Star using XLX - I've reverted MMDVMHost and ircDDBGateway (in the last few mins) to compile with -O3 again to see if that cures the issue. Seemingly radios hear the EOT marker and emit the Icom beep during reception - its very weird, and I'm not totally sure its me this time...

narspt commented 3 years ago

Really weird :S I can't test that now, but maybe I test it later with some more time. Thanks for letting me know.

narspt commented 3 years ago

Can't reproduce it even with your "last but one" binaries... tried the different DStar protocols, but nope... where is it reported, just for curiosity?

MW0MWZ commented 3 years ago

XLX 508, testing on Module J. This one really is like catching smoke, its not exactly easy to re-produce, apparently the new bins started this happening. Equally this is happening on other XLX hosts.

I connected to XLX 508 using DPlus/DExtra/DCS and the issue was apparent with all of them, I even compiled a vanilla MMDVMHost to test - and the same issue was apparent both directions. Re-compiled ircDDBGateway too, without the -Os compile flag, apparently the same.

I'll do some more testing soon.

narspt commented 3 years ago

Maybe a good idea to mix older MMDVMHost and ircDDBGateway, one at a time to at least identify the culprit... How often it beeps?

MW0MWZ commented 3 years ago

can be every few secs, can be 40 secs+ - its not exact by any means.

narspt commented 3 years ago

The other day I said above that I couldn't reproduce it I did just test it myself between bluedv and the hotspot, as I have only one D-Star radio. Today I did some real tests with another colleague and I confirm, I can also reproduce the beeps on my RX with all updated on Pi-Star, logs show one single transmission from him but the beeps are there. Also I can tell that the other colleague also tested using a non-updated Pi-Star and could listen me ok, then maybe no problem on my TX. Will try to do some more tests soon. I'm surprised that no one else is reporting this on forum, etc...

MW0MWZ commented 3 years ago

Just had the one report so far... its... Odd.

narspt commented 3 years ago

I did some more testing today with a friend and I was able to successfully record the stream of some of his transmissions exhibiting the problem, I was then able to always reproduce the issue easily when replaying these recorded streams, this allowed me to do more tests. I'm now sure that the problem is on MMDVMHost and is specifically caused by some change on this PR: https://github.com/g4klx/MMDVMHost/pull/667 If I checkout 787393e4c275d54e9f4d62cc74fb365d0faece66 (this is just before merging the PR) and compile it then I have NOT the problem, but if I checkout 4897313fc05afb5f74974bb5b36c4aeed017f4e9 (this is just when the PR is merged) and compile it then I have the problem, exactly like with your updated binary. I will post on PR page reporting the issue, and include my stream recordings that allow to reproduce it easily, hoping the guys involved on these changes my look at it.

MW0MWZ commented 3 years ago

Excellent work - thank you. We also have another problem, with ircDDBGateway no longer reacting to DTMF, that has also been traced to changes in MMDVMHost at approximately the same time (late november 2020). This commit works: https://github.com/g4klx/MMDVMHost/commit/8656aaedaa17f99838f7922e029876494a85d096 and this one does not: https://github.com/g4klx/MMDVMHost/commit/efe9b3d45956873da57e6a43832ddcf4edb26f74 (27th and 29th of November 2020 respectivly.

Clearly there was a lot of work going into D-Star at this time - I certainly dont want to have Jonathan back it all out, but some of it is just not working as intended.

narspt commented 3 years ago

Then the commit causing dtmf problems also belongs to same PR #667, later with more time I may also try to identify the exact commit causing the beeps problem. Hope the guys involved can sort these issues soon... for now, for normal usage I'm using a version checkout at 787393e4c275d54e9f4d62cc74fb365d0faece66 (this is from Dec 13 just before merging the PR that includes all these commits with dstar changes).

narspt commented 3 years ago

Well, it was not that hard to find the exact commit, done. Apparently same one as you pointed as culprit for DTMF problem.