Tlf / tlf

TLF - a console based ham radio contest logger
https://tlf.github.io
GNU General Public License v2.0
66 stars 32 forks source link

Test failure on MIPS and other RISC archs: test_cabrillo.c:159: error: Failure! #210

Closed df7cb closed 3 years ago

df7cb commented 3 years ago

tlf 1.4.1 fails on mips64el:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979377 https://buildd.debian.org/status/fetch.php?pkg=tlf&arch=mips64el&ver=1.4.1-2&stamp=1609846479&raw=0

FAIL: run_cabrillo
==================

[==========] Running 12 test(s).
[ RUN      ] test_starts_with_succeed
[       OK ] test_starts_with_succeed
[ RUN      ] test_starts_with_fails
[       OK ] test_starts_with_fails
[ RUN      ] test_translateItem
[       OK ] test_translateItem
[ RUN      ] test_freeCabfmt
[       OK ] test_freeCabfmt
[ RUN      ] test_parseLine
[       OK ] test_parseLine
[ RUN      ] test_readCabrilloFormatUniversal
[       OK ] test_readCabrilloFormatUniversal
[ RUN      ] test_readCabrilloFormatWAE
[       OK ] test_readCabrilloFormatWAE
[ RUN      ] test_readCabrilloFileNotFound
[       OK ] test_readCabrilloFileNotFound
[ RUN      ] test_readCabrilloFormatNotFound
[       OK ] test_readCabrilloFormatNotFound
[ RUN      ] test_cabToTlf_ParseQSO
[  ERROR   ] --- 0 != 0x3
[   LINE   ] --- test_cabrillo.c:159: error: Failure!
[  FAILED  ] test_cabToTlf_ParseQSO
[ RUN      ] test_cabToTlf_ParseXQSO
[  ERROR   ] --- 0 != 0x3
[   LINE   ] --- test_cabrillo.c:179: error: Failure!
[  FAILED  ] test_cabToTlf_ParseXQSO
[ RUN      ] test_cabToTlf_ParseQTC
[       OK ] test_cabToTlf_ParseQTC
[==========] 12 test(s) run.
[  PASSED  ] 10 test(s).
[  FAILED  ] 2 test(s), listed below:
[  FAILED  ] test_cabToTlf_ParseQSO
[  FAILED  ] test_cabToTlf_ParseXQSO

 2 FAILED TEST(S)
FAIL run_cabrillo (exit status: 2)

... and similarly on other architectures: https://buildd.debian.org/status/logs.php?pkg=tlf&ver=1.4.1-2

dl1jbe commented 3 years ago

Thanks for the report. We will look into it.

zcsahok commented 3 years ago

@df7cb to get the full picture: was all OK before? if yes, what has been changed? (gcc? hamlib? ...) Thanks.

airween commented 3 years ago

@zcsahok,

here are the full list of logs of buildd with Tlf 1.4.1-1 (before the hamlib4).

The Sid version of Debian is continuously upgraded, so it may be the cause of GCC - but I don't think that's a Hamlib issue.

zcsahok commented 3 years ago

FYI trying to reproduce this but it's not possible to install current [20201202] sid mips64el/mipsel using QEMU 1:3.1+dfsg-8+deb10u8 due to kernel panic. (previous installer [20200314] does boot, however) Now installing buster and will try to upgrade.

df7cb commented 3 years ago

1.4.1-1 built on 19 May 2020 with gcc-9_9.3.0-12 was fine, while 1.4.1-2 built on 05 Jan 2021 with gcc-10_10.2.1-3 was failing. But that was surely not the only toolchain change... Thanks for looking!

df7cb commented 3 years ago

With #211 fixed I retried building git head on mipsel, and the tests are all passing now.

Could you tag a new release? That would be the easiest way forward I think.

zcsahok commented 3 years ago

Kind of magic, as the changes were just at unrelated places.

df7cb commented 3 years ago

That might mean the bug is still there... :( Still, a release would fix the current problem that tlf doesn't build on all Debian release architectures.

airween commented 3 years ago

Or we can make a patch from the upstream for the 1.4.1. I'll take a look it soon.

dl1jbe commented 3 years ago

@df7cb What is the difference between 1.4.1-1 and 1.4.1-2?

Thanks to the lot of changes in the last year a new release is really due. But does it make sense to have a new release with the unresolved problem of a potential failing test.

Best would be to compile on mipsel64 with -g and run the test (run_cabrillo) separate in a gdb session to narrow down the problem.

@zcsahok if you could manage to set up a system in qemu that would be great.

df7cb commented 3 years ago

Code-wise there is no difference, but it's now built with hamlib 4, and the toolchain changed a lot.

I can do more debugging on the mipsel/mips64el machine tomorrow.

AdrianBunk commented 3 years ago

Building with -fsanitize=address on amd64 gives:

==12985==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55fc9b0c4afc at pc 0x55fc9b0b2ec5 bp 0x7ffc26e11a80 sp 0x7ffc26e11a78
WRITE of size 4 at 0x55fc9b0c4afc thread T0
    #0 0x55fc9b0b2ec4 in write_log_fm_cabr src/readcabrillo.c:91
    #1 0x55fc9b0b39ee in cab_qso_to_tlf src/readcabrillo.c:363
    #2 0x55fc9b0b14af in test_cabToTlf_ParseQSO test/test_cabrillo.c:157
    #3 0x7fc9cde8e238  (/lib/x86_64-linux-gnu/libcmocka.so.0+0x5238)
    #4 0x7fc9cde8ebd8 in _cmocka_run_group_tests (/lib/x86_64-linux-gnu/libcmocka.so.0+0x5bd8)
    #5 0x55fc9b0b0601 in main test/run_cabrillo.c:37
    #6 0x7fc9cdcead09 in __libc_start_main ../csu/libc-start.c:308
    #7 0x55fc9b0b0829 in _start (/tmp/tlf-1.4.1/test/run_cabrillo+0xf829)

0x55fc9b0c4afc is located 4 bytes to the left of global variable 'qsoflags_for_qtc' defined in 'test_cabrillo.c:15:5' (0x55fc9b0c4b00) of size 80000
0x55fc9b0c4afc is located 56 bytes to the right of global variable 'bandinx_spy' defined in 'test_cabrillo.c:26:5' (0x55fc9b0c4ac0) of size 4
SUMMARY: AddressSanitizer: global-buffer-overflow src/readcabrillo.c:91 in write_log_fm_cabr

The bug is in https://sources.debian.org/src/tlf/1.4.1-2/src/readcabrillo.c/#L91 When nr_qsos is 0 you are overwriting something outside the array, and on some architectures this might just be the value of bandinx_spy (on amd64 there seems to be some space between the variables so the write might be harmless there). Random changes of the code might just change what gets wrongly overwritten.

I've confirmed that an if(nr_qsos) before that line fixes the test on MIPS, I do not know whether this is the correct fix.

dl1jbe commented 3 years ago

Looks promising.

A quick check of the function in question shows that two lines up store_qso() is called. In that function nr_qsos gets incremented. So normally the variable is at least one.

But in the testcode (test_cabrillo.c) we stub out store_qso() with an empty definition. We should modify it as follow:

void store_qso() { nr_qsos++; }

That should fix it.

AdrianBunk commented 3 years ago
void store_qso() { nr_qsos++; }

That should fix it.

Thanks, works.

df7cb commented 3 years ago

This fixes the test problem on mipsel (32) as well. Thanks! I'll upload that as hotfix so it's no longer blocking the hamlib4 transition in Debian.

dl1jbe commented 3 years ago

Fixed in PR #212. If no negative comment I will pick it up in next two days.

@AdrianBunk '-fsanitize=address' showed some more problems. Will be fixed by above mentioned commits. Thanks for pointing in that direction.

zcsahok commented 3 years ago

Fix e95db78 for Cabrillo test confirmed on QEMU mips64el system using

user@dmips64el:~/tlf$ gcc --version
gcc (Debian 10.2.1-3) 10.2.1 20201224
user@dmips64el:~/tlf$ dpkg -l | grep hamlib
ii  libhamlib-dev:mips64el                   4.0-3                           mips64el     Development library to control radio transceivers and receivers
ii  libhamlib4:mips64el                      4.0-3                           mips64el     Run-time library to control radio transceivers and receivers

It just takes ages to install party due to broken kernel (had to go the buster->bullseye->sid way). And a full Tlf build takes well over an hour.

dl1jbe commented 3 years ago

Ok. Will merge PR #212 now.