Guenael / rtlsdr-wsprd

WSPR daemon for RTL receivers
GNU General Public License v3.0
112 stars 32 forks source link

Does not run on 2 minute cycles #59

Closed mkarliner closed 2 years ago

mkarliner commented 2 years ago

image

It seems to be running a cycle of 3 or 4 minutes.

sm3ulc commented 2 years ago

Seems to run ok for me, but no spots:

./rtlsdr_wsprd -f 40m -d 2 -c x -l x -i 1 -n 4 Found 2 device(s): 0: Realtek, RTL2838UHIDIR, SN: 06001049 1: Realtek, RTL2838UHIDIR, SN: 00000001

Using device 1: Generic RTL2832U OEM Found Rafael Micro R820T tuner Enabled direct sampling mode, input 2

Starting rtlsdr-wsprd (2021-12-07, 19:35z) -- Version 0.3 Callsign : x Locator : x Dial freq. : 7038600 Hz Real freq. : 7038600 Hz PPM factor : 0 Gain : 29 dB Wait for time sync (start in 52 sec)

          Date  Time(z)    SNR     DT       Freq Dr    Call    Loc Pwr

Allocating 15 zero-copy buffers No spot 2021-12-07 19:37z No spot 2021-12-07 19:39z No spot 2021-12-07 19:41z

Guenael commented 2 years ago

Well, I guess I broke something during the refactor or some update...

The next step is to include some testing in the build pipeline :) Ex. a baseband IQ file to test the decoded output (but it's a very big file...)

@mkarliner @sm3ulc Thanks for these reports, I will investigate.

Guenael commented 2 years ago

Note to the future myself: Diff back-end side:

https://github.com/Guenael/rtlsdr-wsprd/blob/main/wsprd/wsprsim_utils.c#L123
  <         call6=strtok(NULL," ");
  <         *n=pack_call(call6);
  ---
  >         *n = pack_call(strtok(NULL, " "));

https://github.com/Guenael/rtlsdr-wsprd/blob/main/wsprd/wsprsim_utils.c#L300
  <     unsigned int nencoded=162;
  ---
  >     unsigned int nencoded = (nbytes * 2 * 8);  // This is how much encode() writes

https://github.com/Guenael/rtlsdr-wsprd/blob/main/wsprd/wsprd.c#L477
  <         } else {
  <             fhash = fopen("hashtable.txt", "w+");
  ---
  >             fclose(fhash);
Guenael commented 2 years ago

@sm3ulc @mkarliner I rollback a change in the backend, could you double check and also try with -S -Q options? Thanks!

sm3ulc commented 2 years ago

Note: Q needs "argument".

./rtlsdr_wsprd -f 40m -d 2 -c x -l x -i 1 -n 4 -S -Q q Found 2 device(s): 0: Realtek, RTL2838UHIDIR, SN: 06001049 1: Realtek, RTL2838UHIDIR, SN: 00000001

Using device 1: Generic RTL2832U OEM Found Rafael Micro R820T tuner Enabled direct sampling mode, input 2

Starting rtlsdr-wsprd (2021-12-08, 07:25z) -- Version 0.3 Callsign : x Locator : x Dial freq. : 7038600 Hz Real freq. : 7038600 Hz PPM factor : 0 Gain : 29 dB Wait for time sync (start in 47 sec)

          Date  Time(z)    SNR     DT       Freq Dr    Call    Loc Pwr

Allocating 15 zero-copy buffers No spot 2021-12-08 07:27z No spot 2021-12-08 07:29z No spot 2021-12-08 07:31z

mkarliner commented 2 years ago

I'm afraid that didn't help. Also, appears to be decoding at 1 second past the minute instead of 50 secs past. image

mkarliner commented 2 years ago

Also, does not terminate after -n 4 image

Guenael commented 2 years ago

Thanks for your reports. I will write a self-test function, it will be easier.

Guenael commented 2 years ago

I made an update, with to new functions:

The decoder works fine, so I guess it's an issue with the rtl interface code. I rollback the compiler (to gcc) but I don't think it's the issue.

Anyway, additional help appreciated to test this version (I don't have any radio setup for now).

mkarliner commented 2 years ago

I've tried -t which says it works I've also tried -w but it's not obvious how to use it, as I have no way to specify a file. the file appears to be rx_options.filename, but it's not clear where it is set. Could you please tell me what test you would like run? Also, it's doesn't seem to me that the issue is decodes, so much as the samples being taken at the wrong time (odd minutes) and at 4 minute intervals.

Guenael commented 2 years ago

I hope this issue is solved now.

Thanks!!

mkarliner commented 2 years ago

Debian results: Time cycle all over the place... 1 minute sometimes 2. image

mkarliner commented 2 years ago

I couldn't get a recording to work on the RPi, here is one from Debian (no spots foobar.f_2021-12-11_12-55-59.zip )

Github won't take the .i suffix, so I've renamed this to .zip - hope this works.

Also, on Debian, I'm getting no decodes despite my WSPR beacon being at the other end of the garden ...

Guenael commented 2 years ago

@mkarliner Ok, I took a look on this sample, and there is no visible signals (only white gaussian noise). But I also double checked the command your started on your previous screenshot, and I'm not sure what you trying to do. Quick questions:

Thanks

Guenael commented 2 years ago

For any further test, please use a Linux x86 machine, and not a Raspberry Pi. I'm not sure what's wrong with the RPi, but there are:

mkarliner commented 2 years ago

Yes, you are righ about me missing out the -d 2 on my last test - my bad. Testing on x86 is tricky, I don't have an x86 in the garden, and where I do, there is too much QRM. The last test I did, I used the rtl_tcp on the Pi to GQRx and WSJTX on my Mac and got WSPR decodes. I then ran rtlsdr_wsprd and got no spots, but at 6 minute (even) intervals.

I assume that each two minutes, it should either report no spots or a list of spots,

I'll see what I can do to get a decent antenna feed close to the x86 box.

Guenael commented 2 years ago

I conducted some other test, on a RPi 2 & 3, with a legacy version (0.2) and a new one.

For both, rtlsrd-wsprd exceeded the acquisition time, because it was not able to get/process all the samples (a fixed amount of data are required, so it takes longer).

CPU processing is in the roof, for both rtlsdr_ft8d & rtlsdr_ft8d apps:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                        
 1517 pi        20   0   44216   6836   5960 S 100.0   0.7   0:04.01 rtlsdr_ft8d   

I'm confuse, nothing changed, it my legacy app, and this app worked fine on a Raspberry PI 1!! I guess something goes wrong with my code an newer version of pthread, or something like that.

I will reinstall an old version RPi OS, 2015/2017 - not sure, and double check if there is any improvement.

mkarliner commented 2 years ago

I tried exactly the same thing yesterday! This explains the timing issues. 100% CPU on RPi 3a + I suspected that the threads might be running on multiple cores and interfering with each other in ways that didn't happen five years ago, so I put maxcpus=1 in /boot/cmdline.txt to run only one core and the result was the same. So, it's not a multicore issue. I've tried a fork of your code from 5 years ago, result was the same. I have RPi 1's here (I have everything :-), if I get time, I may try one of them. My best guess is that faster Pi's show up a timing fault that allows threads to free run. I haven't really dug into the code, but it seems that you are using shared variables to sync the threads, mutex's or a que might better. My apologies in advance if I'm talking rubbish or obvious stuff, I have limited time here to put in enough effort to make better comments .

Guenael commented 2 years ago

@mkarliner Some hope!! I finally found the issue :tada: I tested an old version of Raspbian, and the acquisition time was fine (with 14% CPU on a RPi3). The exit was also fine with a CTRL-C.

I will continue my tests to try to identify the differences between this old version of the OS and the new one.

FYI, I used this image: https://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2019-04-09/

And if you take a look on the official web page, there is two tracks:

I guess the raspios oldstable will work, but I will make a test. Anyway, I publish a release, and refactor the acquisition strategy. I did a better job on rtlsdr-ft8d, and I have to back-port some update.

I will also put this on the README page, a section about the OS supported, definitely useful :)

dforsi commented 2 years ago

The decoder works fine with .c2 files saved by wsjt-x if we skip the first 26 bytes, eg with dd: dd if=~/.local/share/WSJT-X/save/211212_2144.c2 skip=26 ibs=1 of=211212_2144.raw

I suggest that you make the read and write options of rtlsd_wsprd use the same format as wsjt-x for easier testing

See writec2file() that writes a header with filename, duration in minutes (2 or 15 I think) and dial frequency. Here is the code:

https://sourceforge.net/p/wsjt/wsjtx/ci/master/tree/lib/wsprd/wsprd.c#l657

unsigned long nwrite = fwrite(c2filename,sizeof(char),14,fp);
nwrite = fwrite(&trmin, sizeof(int), 1, fp);
nwrite = fwrite(&freq, sizeof(double), 1, fp);

In addition, the static array for the filename in rtlsd_wsprd.h is be too small to hold the 41 bytes of ~/.local/share/WSJT-X/save/211212_2144.c2 and I I suggest that you store the value of the optarg pointer instead.

Guenael commented 2 years ago

@dforsi yep, good idea, for both points. I add this in my todo-list :) And thanks for the confirmation about the decoding.

Guenael commented 2 years ago

Some updates, I ran a couple of tests, and I found something interesting. With a fresh image of buster, my apps worked fine, and after the update, I got the 100% CPU issue (possible issue with pthread).

Before this update, the OS was booted with a kernel 4.19.50-v7+ and after 5.10.63-v7+. I don't understand what is going on, but this is a clue :)

Details below:

2019-06-20-raspbian-buster-lite.img WITHOUT update -- rtlsdr_wsprd works fine, 15% CPU


$ file /lib/arm-linux-gnueabihf/ld-2.28.so 
/lib/arm-linux-gnueabihf/ld-2.28.so: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, BuildID[sha1]=fb85e699c11db06c7b24f74de2cdada3146442a8, stripped

file /lib/arm-linux-gnueabihf/libpthread-2.28.so /lib/arm-linux-gnueabihf/libpthread-2.28.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=7958164ddcdf86b06e4a06700f80a4655a80c40e, for GNU/Linux 3.2.0, not stripped

$ uname -a Linux raspberrypi 4.19.50-v7+ #896 SMP Thu Jun 20 16:11:44 BST 2019 armv7l GNU/Linux

$ cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"


> `2019-06-20-raspbian-buster-lite.img`
> AFTER update -- rtlsdr_wsprd fail and miss some samples, 100% CPU...
```bash
$ file /lib/arm-linux-gnueabihf/ld-2.28.so 
/lib/arm-linux-gnueabihf/ld-2.28.so: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, BuildID[sha1]=7ad37bc5cbf163f01d8877a20da3b9d660c2dc31, stripped

file /lib/arm-linux-gnueabihf/libpthread-2.28.so 
/lib/arm-linux-gnueabihf/libpthread-2.28.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=befc81c22bb8a7e6d9899ebec62ecf501bec8678, for GNU/Linux 3.2.0, not stripped

$ uname -a
Linux raspberrypi 5.10.63-v7+ #1496 SMP Wed Dec 1 15:58:11 GMT 2021 armv7l GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

If you just want to see rtlsdr-wspr running, you can take this image below, and update it, the kernel will remains a 4.x version. https://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2019-04-09/

On my side, I will use this image to finish my refactor, and add some features. I will deal with this issue after (I'm done with it for now, and I don't expected this kind of issue when I started rtlsdr-ft8d to help a friend :) Anyway.

Guenael commented 2 years ago

@IZ7BOJ @dforsi @sm3ulc @mkarliner : Alfredo (IZ7BOJ) found the root cause of this issue and provided a solution. The issue come from librtlsdr-dev package. On a recent update, I added this library directy in the list of the packages to install with APT, because it removes a step. I guess this package was not compiled with the same options, and installing this package manually solve the issue!

Based on Alfredo notes on our Slack channel (BTW, I can invite you on my Slack if you want, it's all about Radio, programming stuff & developing some hardware for my next beacon project.) you can do this on your actual distro to fix the issue:

  1. Uninstall the package:

    sudo apt-get purge --auto-remove librtlsdr-dev
  2. Build & install rtl-sdr manually:

    apt-get install cmake
    git clone https://github.com/osmocom/rtl-sdr
    cd rtl-sdr
    mkdir -p make
    cd make
    cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr -DDETACH_KERNEL_DRIVER=ON -Wno-dev ..
    make
    sudo make install

That's it.

Thanks again Alfredo!! 73

Guenael commented 2 years ago

Hi all!

I updated the doc, and made two release:

Let me know if it works for you. Thanks!!

mkarliner commented 2 years ago

Hooray!