cyoung / stratux

Aviation weather and traffic receiver based on RTL-SDR.
BSD 3-Clause "New" or "Revised" License
1.06k stars 362 forks source link

SDRs work, then messages go to zero. #388

Closed patbrennan closed 8 years ago

patbrennan commented 8 years ago
  1. Stratux version: v0.8r2
  2. Stratux config:

    SDR - nano 2 [ ] single [ x] dual

    GPS [ x] yes [ ] no type: adafruit ultimate

    AHRS [ ] yes [x ] no

    power source: 5.1v, 2.1amp USB wall &/or car charger outlet

    usb cable:

  3. EFB app and version: ForeFlight 7.6.1

    EFB platform:

    EFB hardware: iPad mini 4 wifi

  4. Description of your issue:

I fire up the unit, and I go to the status page. GPS is registering normally within 30 seconds, and I start to receive messages from 1090 right away. UAT takes a minute, but it will connect to multiple towers. I've connected to as many as 4 before, and messages start rolling in (although not many - maybe 20-300 max). Then...in ForeFlight, traffic hardly ever shows up (momentarily, not all traffic types). I can't get any ads-b weather or radar images. After maybe 2, sometimes 10 minutes, the messages go to zero & then never come back. This happens every time. I'm using a 3d printed case + the nano 2 SDRs & the below hardware for antennae & cable:

http://www.amazon.com/coaxial-cable-assembly-female-right/dp/B00CP17WMG?ie=UTF8&psc=1&redirect=true&ref_=oh_aui_detailpage_o04_s00

http://www.digikey.com/product-detail/en/linx-technologies-inc/ANT-916-CW-HWR-SMA/ANT-916-CW-HWR-SMA-ND/1139580

Restarting the unit seems to get the antennas working again, then the same thing happens. I haven't yet tried stock antennas, but it seems to be a different issue, since I have reception & then it mysteriously goes to nothing.

stratux_log.txt

p.s. how do I include a copy of all gzip log files from the /logs/stratux/ page? Stratux txt log included here.

Thanks in advance!

skypuppy commented 8 years ago

First two things to always check (maybe should be in the FAQ?) is:

  1. power to the Pi (no pun intended -: ensure the red led on the Pi is always lit,
  2. thermal considerations - the radios get really hot. If they're inside the case, they may overheat to the point of not working. If that happens, cooling methods are required. Same for ?CPU.

Yours certainly sounds like a thermal issue, but could also be a couple other identified problems.

Side note: re antennas, stock antennas, strangely, are second best at reception, bested only by DMurray's newest offering. In all cases, reception is greatly enhanced (should you want or need it) by using an adequate ground plane.

patbrennan commented 8 years ago

Is there anything in the log files that might tell you if they're overheating?

It's not #1 - I've ensured power is always on.

If it were the antennas, wouldn't I be getting zero to weak reception? Just today I powered it up & in minutes had 4,000 + messages on the 1090 channel, sitting in my hotel. What do you mean by "adequate ground plane"?

I will look into adding a fan to the case, but the CPU temps hover around 130 farenheit, and the heat sinks are installed.

Ergonomicmike commented 8 years ago

I've seen this behavior in reverse - start Stratux, no signals for a minute or two until the SDR's warm up -literally. My guess is that your SDR is drifting off freq as it heats up. Do the kal dance after 20 minutes to get a PPM.

egid commented 8 years ago

I saw this yesterday - the SDR count went from 2 to 1 to 0, upon which I rebooted and all was fine for the remainder of the flight.

cyoung commented 8 years ago

@patbrennan - What case design are you using? Have you tried running it opened up (for ventilation)?

patbrennan commented 8 years ago

This is the case design I have: http://www.thingiverse.com/thing:1182619

I have not put in the fan yet, however. I did switch the USB ports the radios were in to see if it would make a difference. This is a fairly popular case design, too. A vast majority of other cases don't have fans, so I'm not really sure overheating would be an issue, would it?

I will test & let you guys know. Let me know if you see anything weird in the log files. Thanks!

Ergonomicmike commented 8 years ago

I had a similar issue yesterday. I've been starting the Stratux about 10 minutes before iFly. When I connect with iFly, it's like 1090 had stalled. A reboot, and everything okay for rest of flight.

Ergonomicmike commented 8 years ago

Had to reboot on a flight just now to restart 1090. Running 4964.sh.

skypuppy commented 8 years ago

Were you also using 978? Did it stop working as well? This problem is baffling. Hardware or software issue? If it's not power or heat... We need a USB logic analyzer in the loop if it's not a heat issue.

Ergonomicmike commented 8 years ago

978 was on also, and I was getting messages on it. I suppose my problem could be a bad SDR, 'cause it's the same one that I thought was bad last fall.

peepsnet commented 8 years ago

does the SDR count go to 1? The pressure the antenna cable put on the SDR killed 2 nano2+ for me!!

Ergonomicmike commented 8 years ago

I didn't check the Count, but will the next time it happens.

skypuppy commented 8 years ago

What does that mean, @peepsnet ?

peepsnet commented 8 years ago

The system(stratux code) keeps a realtime count of the SDRs.

What does that mean?

If you are referring to where to find the info it is on the home Stratux webui(192.168.10.1), upper right above the message count. SDR devices:

If you are referring to the implications of the data. It is a count of the total number of SDRs found by stratux. This updates in realtime(give or take 5-10 seconds)


Maybe there could be some more visual alert to an SDR selected but not working... @AvSquirrel You're up!!

Maybe the main webui page could show a green check/red X(same as the GPS) for each SDR

:red_circle: UAT : |==========================================| :white_check_mark: 1090es: |==========================================|

This example meaning the 978 and 1090 SDRs are selected but the 978 SDR is not functioning.

This is the same as #276

Ergonomicmike commented 8 years ago

1090 stalled again for me. UI showed 2 SDR's. Reboot in flight fixed.

MSGTFE commented 8 years ago

I noticed that without a ground plane I would get interference from the SDR's and GPS. For me when I had the case disassembled but had the components installed I would get real good performance. However when I put everything together and fired it up I would get the same symptoms that you describe. A good test is too split your case and set the two sides a few inches away from each other preferably outside and see how everything works. If you are not getting the same symptoms then you are probably getting interference when it is all put together. I would recommend putting in a ground plane a try to isolate the GPS somehow Kapton tape or grounding.

Axtel4 commented 8 years ago

Does this primarily occur in the presence of high power L-Band emitters?

patbrennan commented 8 years ago

OK so I tested this out a bit... The short story is: It did the same thing, and messages are reading, then in 1-2 minutes go to 0, and don't come back. Admin panel shows 2 SDRs the whole time. Reset seems to make messages come back, but it does the same thing.

First test: Case open, split apart, gps away from SDRs (to test interference from GPS causing SDRs to drop).

Second test: Case open, fan on SDRs (to test if cooling was an issue). The SDRs were actually cool to the touch, and CPU temp was ~ 30-40 degrees F cooler than normal.

Third test: Case open, fan on SDRs, GPS unplugged.

Again, all tests resulted in the same result. I've had the same results in the air, on the ground, close to ground stations, etc. Did anyone peek at the files I uploaded originally to see if there was anything off?

Pic of my build: file_000

Ergonomicmike commented 8 years ago

Flew 2.6 hours today. 1090 kept dropping off, this time faster than before. Initially, I rebooted to fix. Then I thought about disabling the SDR's from the webui, and only run 1090. That worked for a short time. (Anonymous dongles.)

Eventually rebooting didn't work anymore and the Stratux quit on me. (Although still a strong Wi-Fi on Ch 1.) I wasn't able to reach the power cable in flight. (Bad decision on my part. Sleeping bag prevented me from reaching around to the back.)

Here's the link to my stratux.log. (All other logging turned off, in order to eliminate that variable.)

https://www.sendspace.com/file/lfb7d5

The flight in question is 4/25. (The last entry should be later this evening, when I transferred the log file to my tablet.)

Axtel4 commented 8 years ago

Do you have the radios assigned to a specific band? If so you might swap the bands to see if the problem follows.

cyoung commented 8 years ago

Eventually rebooting didn't work anymore and the Stratux quit on me. (Although still a strong Wi-Fi on Ch 1.)

Quit = nothing in the EFB, or nothing on the WebUI, or both?

Ergonomicmike commented 8 years ago

@Axtel4 The radios are anonymous.

@cyoung Quit = nothing in the EFB and the WebUI. I shut down the browser (which clears its cache) and got the "unable to connect" internet message.

Ergonomicmike commented 8 years ago

@cyoung More info: I looked at the free space on the 2nd partition of the SD card. It has 1.78 GB used out of 1.79 GB total. 16 MB remaining. This is with logging off. (All the gps, uat, etc. logs are 10 bytes.)

I tried to use find and ds to find the largest files, but it appears that those commands are to be run inside the SD card, and I had the card hooked up to a Linux,box via a USB adapter.

Corrected 10 MB to 10 bytes. (sorry - distracted taking care of someone back from surgery.)

Ergonomicmike commented 8 years ago

I ran gparted on my spare SD card (which I could not get into the Stratux in flight - again poor planning on my part) and it shows a 1.79 GB partition, with 298 MB free. So, from my previous comment, something appears to be writing to the card in flight and filling up my SD card. Is there a utility in Linux equivalent to SpaceMonger for Windows where I can get a graphical display of file size on my SD card and find what these files are?

skypuppy commented 8 years ago

You might try baobab (Disk Usage Analyzer). However, it does require the graphical interface -- and may only work if you're using gnome. I would imagine if will work on the Pi itself if you have the full OS loaded (not the version from stratux.me.)

And you might have to get a 64 or 128 Gb SD card. :) :)

Axtel4 commented 8 years ago

So, is this a matter of Stratux running out of storage space?

I looked at the free space on the 2nd partition of the SD card. It has 1.78 GB used out of 1.79 GB total. 16 MB remaining. This is with logging off. (All the gps, uat, etc. logs are 10 bytes.)

Ergonomicmike commented 8 years ago

@cyoung Okay, I figured it out. (Hacked my way to mounting the SD and then some arcane find string.)

The two largest files on my card are stratux.sqlite at 153 MB. And the swap, at 100 MB. Apparently that maxes out the card. Should we be using the raspi-config with the Jessie build to use the whole card? And/or does the sqlite file grow without bound?

skypuppy commented 8 years ago

One should always be using the whole card. it only takes a moment for and SD card, whether using gparted or the raspi-config, but the raspi-config is much easier.

Ergonomicmike commented 8 years ago

Perhaps so, but that's not SOP for the typical user. And why is the sqlite log being written to when I have logging disabled?

skypuppy commented 8 years ago

The typical use must be trained or else put up with e-x-t-r-e-m-e-l-y long download times, costing someone more money in bandwidth. I don't know if there is a way to force raspi-config with "expand" parameter on boot, then reboot but even with that, it would scare some users because it doesn't look normal.

Somewhere else I read that the sql logs are independent of the logs referred to by the webui. I assume they are for debugging purposes for now, but don't know if the plan is to leave them implemented long term.

cyoung commented 8 years ago

@Ergonomicmike - looking. You had replay logs disabled and stratux.sqlite was that large? Could you send it over?

ghost commented 8 years ago

@Ergonomicmike -- you said you were "Running 4964.sh" . That's commit acee114964.

As of that commit, the sqlite log was still pulling in dump1090 console messages regardless of "Replay Log" setting. That was fixed a couple days later, and all builds since 4/13 should only be pushing timestamps and twice-a-minute status updates to the sqlite log.

Ergonomicmike commented 8 years ago

@cyoung @AvSquirrel Yes, I've had logs disabled ever since I had reimaged to troubleshoot the 1090 disconnect issue. So, yes, the sqlite log was that large with logging disabled. Unfortunately, I deleted it yesterday when I sh'd up to yesterday's latest build. Fortunately, AvSquirrel says the logfile problem is fixed now. Unfortunately, I won't have an opportunity to fly for a while to test things out.

skypuppy commented 8 years ago

1090 should build just sitting at home as stratux still sees a lot of it, at least at my house. If you let it run for 3 or 4 days, that might be the same as a flight, for 1090 data.

Ergonomicmike commented 8 years ago

Okay, I did what skypuppy suggested (albeit not for 3 or 4 days) and I ran stratux f0fb for about 5 hours last night. Max 1090 msg rate something like 12000. There are two sqlite files: stratux.sqlite is 320 Kb. A stratux.sqlite-wal is 1026 Kb. So not as bad as before.

Next - I hope I'm not speaking out of turn, but speaking philosophically, I've lost track of why the project is concentrating on sqlite logging so close to a v1.0 release.

IIRC, the original purpose of logging was to help find and squash bugs, which it's done. (The Malcolm-Robb 1090 thing, for example.) But now it seems that the new logging is geared more toward using the project as a science experiment (to gather new, interesting data) as opposed to squashing bugs.

Are there any more 978 or 1090 bugs to squash? I've never been a Product Manager, but if there are no more major bugs, then is there a need for better logging? (Which would be moot anyway if the final release is RO.) I politely submit that the project might consider holding on extended features and try to get to a stable v1.0 release for the masses first. Then, after that, it could fork (or whatever) into extended features, like extended logging, FLARM, etc. for those still interested.

By way of example, Steve is holding his Flight Box at ab76 (from March 21) with RO and seems to be doing fine. Could his version be made a v1.0 release (or perhaps a v.99b) for Stratux to "get 'er done"?

Update: Or, after looking thru the commits on v0.8r2 and the few remaining bugs in the github, perhaps a 0.9b?

cyoung commented 8 years ago

You're right, it has been somewhat of a distraction. I think it is important for a "v1.0" release because the previous logging system was a bit disorganized and cumbersome for users to pull up and send. Requesting a single logfile that has everything stored in an organized and standard way makes it easier for us to figure out problems. It's also going to be impossible to work further on AHRS or GPS attitude without fixing up the replay system (not that this should go in "v1.0").

v0.8r2-db130aab76 is doing fine indeed, that's why it's still the latest release and what is recommended on the website.

"v1.0" is a somewhat arbitrary line to draw. The things we're breaking here aren't being sent out to production, and we need the freedom to break things to make progress. Maybe what is confusing is how the most current builds use the most recent tag, so you're seeing it "update-stratux-v0.8r2-bd2402f0fb" and thinking this is a release quality update. It's not.

Ergonomicmike commented 8 years ago

Thanks for the explanation. Not sure what you mean when you said that a v1.0 is a somewhat arbitrary line to draw. I presume you have a milestone in mind, like where it would be safe to make that first flight in an Experimental.

For me, in addition to being safe for flight, a v1.0 is psychological, in that it says "We're done for now and it works acceptably well for the standard user." (Unless, of course, Microsoft says it.)

jzeevi commented 8 years ago

V1.0 is the release that has the defined set of features agreed to by the interested parties. In this case (Free Software) the owner gets to decide that. I think it’s fairly clear that Chris is benevolent in that sense and has been taking suggestions about what should/shouldn’t be in the product.

The quality of the release is a separate item. Many companies make lots of money releasing what we might consider beta code to the marketplace and then scrambling to fix things as they come up. Others do it as right as possible the first time. This is why many companies/development efforts go to the trouble to define requirements and test requirements that ensure the product is functionally what it says it is. The challenge is that there’s always some user that finds some scenario that causes the product to fail. Sometimes that’s easy to manage, other times it’s catastrophic.

It’s not always a fun thing, but if someone could come up with the required set of features that v1.0 MUST have, we could (as a community) target our testing/usage toward that. If it’s JUST WX, then everything else is just a feature under development. If it’s WX+Traffic, then that’s the set. If it’s “support every EFB in the World” then that’s what we’d target for testing.

The Open Source community has often been hard-pressed to do this all that well until someone/some company takes “control” and drives. A perfect example is Ubuntu Linux. Linux is just an OS with a lot of extensions and capability. Canonical turned it into a distributable product.

I don’t know that we want/need that for Stratux, but for Chris’s development, I kinda think he owns it. For any other branch (think Mint or Redhat) someone else would own that. Maybe they’d intertwine and merge, maybe not. For now, it’s up to Chris, I’d think.

From: Ergonomicmike [mailto:notifications@github.com] Sent: Thursday, April 28, 2016 12:14 PM To: cyoung/stratux Subject: Re: [cyoung/stratux] SDRs work, then messages go to zero. (#388)

Thanks for the explanation. Not sure what you mean when you said that a v1.0 is a somewhat arbitrary line to draw. I presume you have a milestone in mind, like where it would be safe to make that first flight in an Experimental.

For me, in addition to being safe for flight, a v1.0 is psychological, in that it says "We're done for now and it works acceptably well for the standard user." (Unless, of course, Microsoft says it.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/cyoung/stratux/issues/388#issuecomment-215498696 https://github.com/notifications/beacon/APJ34fGJK0P3N8CbKx2hTclxcMuLLTWuks5p8OrmgaJpZM4IFt4l.gif

Ergonomicmike commented 8 years ago

Thanks for another nice explanation. (No question that this is Chris's baby and he can do whatever he wants with it.) I like the part about setting a goal. I've noticed in my own life that I often skip the important task of writing an outline before I start writing something or I launch into a Glasair project without having written down a plan. (Thinking that I'm producing quicker.) I'm not saying that Chris has done that. Just reinforcing that having clear goals are important.

skypuppy commented 8 years ago

It does appear that stratux development software is ready for that next step. And, in my little opinion, we must hold a higher standard than most software because it is for the aviation (and aviation fans) world. Does anyone have experience in setting these goals/outlines and willing to participate in this loose committee? :) I have but mine was all for the (stilted) gubmnt and nuclear power worlds, so my contribution would be billions and billions of words that mean nothing at all. :)

patbrennan commented 8 years ago

My SDRs are still dropping out. I'm Not a super user - any ideas what I might do to make it easier to troubleshoot the issue? I've tested and eliminated the obvious stuff - overheating, clear reception plane, interference.

Has anyone looked at the log files? I know I'm not the only one with the issue. Thanks!

On Thursday, April 28, 2016, skypuppy notifications@github.com wrote:

It does appear that stratux development software is ready for that next step. And, in my little opinion, we must hold a higher standard than most software because it is for the aviation (and aviation fans) world. Does anyone have experience in setting these goals/outlines and willing to participate in this loose committee? :) I have but mine was all for the (stilted) gubmnt and nuclear power worlds, so my contribution would be billions and billions of words that mean nothing at all. :)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/cyoung/stratux/issues/388#issuecomment-215530610

Patrick Brennan June Commerce, LLC patrick@junecommerce.org O - 702.586.0999

skypuppy commented 8 years ago

If we had a way to log a pstat continuously when this behavior begins... Grasping at straws here, but either we're hitting some maximum entropy/saturation with the radios or something is happening to the stratux task totally unforeseen. With so many dump1090's in existence, maybe they are saturating either IPC or the receiving code in gen_gdl90? Stack exhaustion?

Pat, do you remember anything out of the ordinary when that occurs? Is it only 1090 and never 978? Were there huge amounts of simultaneous 1090 messages?

Ergonomicmike commented 8 years ago

@patbrennan Since there are hundreds of users who don't have this issue, I gotta believe it's something unique to your hardware. (That Adafruit GPS is non-standard.) I see from your profile that you're in Vegas. If you find yourself near Phx, let me know in advance & perhaps I can help troublshoot with my Stratux. (I might be at IGM lightning chasing Sat. evening if you wanted to fly/drive there.) PM me about this.

cyoung commented 8 years ago

@patbrennan - try rolling back to a minimal hardware set, especially with just recommended hardware, and if the problem persists then please re-open.