bemasher / rtlamr

An rtl-sdr receiver for Itron ERT compatible smart meters operating in the 900MHz ISM band.
GNU Affero General Public License v3.0
2.19k stars 249 forks source link

crash #188

Closed fat-tire closed 2 years ago

fat-tire commented 3 years ago

Dunno why this happened-- but ran rtlamr on my pi 4 and came back to this:

14:55:14.565716 main.go:332: read tcp 127.0.0.1:40616->127.0.0.1:1234: read: connection reset by peer
io.ReadFull
main.(*Receiver).Run.func1
    /home/user/go/src/github.com/bemasher/rtlamr/main.go:174
runtime.goexit
    /usr/lib/go-1.11/src/runtime/asm_arm64.s:1114

Ran it again... just plain "rtlamr" without any arguments.. a few minutes later.. same error (only this time the port was 42012 instead of 40616

bemasher commented 3 years ago

Sounds like rtl_tcp died, what output is rtl_tcp producing when this happens?

fat-tire commented 3 years ago

Hey sorry didn't see this until now...

So quitting rtlamr and restarting it leads to this from rtl_tcp:

worker socket bye
Signal caught, exiting!
comm recv bye
Signal caught, exiting!
all threads dead..
listening...
Use the device argument 'rtl_tcp=127.0.0.1:1234' in OsmoSDR (gr-osmosdr) source
to receive samples in GRC and control rtl_tcp parameters (frequency, gain, ...).
client accepted! localhost 57140
Allocating 15 zero-copy buffers
ll+, now 1
ll+, now 2
ll+, now 3
ll+, now 4
ll+, now 5
ll+, now 6
ll+, now 7
ll+, now 8
ll+, now 9
ll+, now 10
ll+, now 11
ll+, now 12
ll+, now 13
ll+, now 14
ll+, now 15
ll+, now 16
ll+, now 17
ll+, now 18
ll+, now 19
ll+, now 20
ll+, now 21
ll+, now 22
ll+, now 23
ll+, now 24
ll+, now 25
ll+, now 26
ll+, now 27
ll+, now 28
ll+, now 29

this count continues to 502.

restarting rtlmar results in no output at all.

Then force-quitting rtl_tcp and restarting as root gives:

usb_claim_interface error -6
Failed to open rtlsdr device #0.

Looks like I have to reboot to get the dongle to be re-recognized

Legomaniac commented 3 years ago

I experience this issue as well, running on a core i5 NUC though in Fedora 32. I call rtlamr with a Python daemon, about once every 2 minutes. I'll start the daemon shortly after boot, and things will be fine for sometimes 3 days, sometimes 2 weeks, but eventually my deamon will error out when it receives this instead of the expected json:

io.ReadFull
main.(*Receiver).Run.func1
        /root/go/src/github.com/bemasher/rtlamr/main.go:174
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1357

I can try to restart rtl_tcp, but I just get the same error from rtlsdr until a reboot:

Jun 04 22:14:00 diginuc rtl_tcp[1490190]:   0:  Realtek, RTL2838UHIDIR, SN: 00000001
Jun 04 22:14:00 diginuc rtl_tcp[1490190]: Using device 0: Generic RTL2832U OEM
Jun 04 22:14:00 diginuc rtl_tcp[1490190]: Found Rafael Micro R820T tuner
Jun 04 22:14:00 diginuc rtl_tcp[1490190]: [R82XX] PLL not locked!
Jun 04 22:14:00 diginuc rtl_tcp[1490190]: Tuned to 100000000 Hz.
Jun 04 22:14:14 diginuc rtl_tcp[1490190]: Allocating 15 zero-copy buffers
Jun 04 22:14:19 diginuc rtl_tcp[1490190]: Signal caught, exiting!
Jun 04 22:14:19 diginuc rtl_tcp[1490190]: Signal caught, exiting!

As reported by @fat-tire I can't really do anything until the system reboots. Then everything seems fine again... sometimes for 2 or 3 days, sometimes for weeks.

For what it's worth, I'm using systemd to run rtltcp. It's in a unit file that so it starts with the host. I don't think that matters as it seems to run fine for quite some time, but I thought it worth mentioning.

Perhaps there is a buffer somewhere that is overflowing? It never seems to happen until it has been running for over a day.

I understand that this probably isn't the best forum to discuss, as it seems to be an error from rtl_tcp and not your code specifically. Do you know where might be the best place to reach out?

Thanks bemasher, I have been using this software for years and I really do appreciate the effort you put into this.

bemasher commented 3 years ago

This is likely a problem with your particular version of rtl_tcp and/or your dongle. Failing at io.ReadFull means there is an issue with the connection to rtl_tcp. You might try building rtl_tcp from source, distro packages can be quite out of date and missing important fixes.

bemasher commented 2 years ago

Closing, stale.

pashdown commented 2 years ago

Tested with a freshly pulled rtl_tcp and rtlamr on a NooElec NESDR Nano 2 Plus. I haven't had any other issues with rtl_tcp clients, but admittedly, my usage has been minimal.

13:07:54.250147 decode.go:45: CenterFreq: 912600155
13:07:54.250211 decode.go:46: SampleRate: 2359296
13:07:54.250216 decode.go:47: DataRate: 32768
13:07:54.250219 decode.go:48: ChipLength: 72
13:07:54.250223 decode.go:49: PreambleSymbols: 21
13:07:54.250227 decode.go:50: PreambleLength: 3024
13:07:54.250230 decode.go:51: PacketSymbols: 96
13:07:54.250233 decode.go:52: PacketLength: 13824
13:07:54.250236 decode.go:59: Protocols: scm
13:07:54.250240 decode.go:60: Preambles: 111110010101001100000
13:07:54.250244 main.go:126: GainCount: 29
13:53:19.896605 main.go:345: Receiver context cancelled.
13:53:19.897124 main.go:322: read tcp 127.0.0.1:38690->127.0.0.1:1433: i/o timeout
io.ReadFull
main.(*Receiver).Run.func2
        /home/pashdown/go/src/github.com/bemasher/rtlamr/main.go:183
runtime.goexit
        /usr/lib/go-1.13/src/runtime/asm_amd64.s:1357
pashdown commented 2 years ago

Looks like this was due to a bad dongle. Don't buy the NooElec NESDR Nano 2 Plus, I've had two of them fail.

fat-tire commented 2 years ago

Can confirm it is a NooElec NESDR Mini 2+ and still having this issue :(

fat-tire commented 2 years ago

Update: So I got another rtl device. And had the exact same problem with the 8gb raspberry pi 4. After a LOT of experimenting, I can report a few things made a big difference.

  1. Used a powered USB hub. I had noticed some power issues in dmesg and this seemed to make everything on the pi a little more stable, even though I only had 3 USB devices. The rpi also needed to support a USB SSD and perhaps the two together were too much for it. I even pulled the ethernet as I thought maybe it and the LEDs could be sucking down too much power, but only the powered USB hub seemed to fix things.
  2. Even after powering the USB hub, I still got device descriptor read/64, error -71 and rtlsdr_demod_write_reg failed with -4 and rtlsdr_demod_read_reg failed with -4 and device descriptor read/64, error -32 errors as if the USB device was being reset or not able to detect the radio.... until I got rid of the rather lengthy (and thick) USB cable and plugged it into the powered USB hub directly. Used the USB 2.0 port as I saw suggested in some forum somewhere.
  3. Switched from the "rabbit ears"-style telescoping antenna to a smaller one with a "spring" section. No idea if this made any difference tho.
  4. I had seen some "io.ReadFull" errors too from main.go, so I also added -n 1024 when starting rtl_tcp. Again, I don't know if this fixed anything but it didn't hurt.

The result-- for about 3 hours now it's been steadily reporting data, whereas before it was getting errors and hadn't reported usage for weeks. YMMV.

I'm posting this update in case anyone runs into the same issue and googles these errors... If it starts to mess up again significantly, I'll post an update. Hope this isn't a "spoke too soon" situation 😬