epics-base / epics-base

The C/C++ core of the EPICS Base control system toolkit
https://epics-controls.org/
Other
130 stars 136 forks source link

[7.0.8] printfTest failure on Debian #466

Open picca opened 6 months ago

picca commented 6 months ago

Hello,

I uploaded 7.0.8 into Debian, and I got a bug report about a failing test

not ok 70 - dbGetField("test_printf_rec.VAL", 0) -> "Format test string c1" == "Format test string 0"

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1065724

This failure happends not on all architectures.

Is it a flaky test or do your think that there is something special about this test ?

Thanks for your help.

Fred

anjohnson commented 6 months ago

We're going to need more information about where you're seeing the failure above to be able to help, the output from the Debian build isn't showing it failing.

That test is from the routine test_hh_flag() in modules/database/test/std/rec/printfTest.c, and it's checking that using a printf record type to print the value 0xffc0c1 using a "%hhx" format string results in only the last byte being output, hence c1 is expected. Some architectures may have bugs in their vsnprintf() routines, the record's code calls epicsSnprintf() which for Debian is called from the implementation in modules/libcom/src/osi/os/posix/osdStdio.c.

picca commented 6 months ago

Hello, I am sorry, I requested a new build and build log url was pointing to the wrong log file. I added information about the failure here

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1065724#12

Do you think that the unsupported architecture should be added in the EpicsHostArch.pl file ?

thanks

anjohnson commented 6 months ago

Hi, thanks for the additional information. The failures from amd64 are both trying to start a PV Access server. The message Parse errors: No plan found in TAP output implies that the test programs could be crashing before they can even run the first statement in main(). There may be more information about the fault displayed earlier in the build output from when the tests were run, or even earlier when they were compiled or linked. The i386 had the same failures, plus the printf record issue above.

Nobody in the EPICS community has reported trying to build or test EPICS on those unsupported architectures. Until someone actually needs them and agrees to test and maintain EPICS on them we won't be willing to add the necessary files to our build system (it's not just EpicsHostArch.pl that would need updating for them to build).

If you have the ability to compile and run code on an i386 architecture Debian system, you could try compiling and running this program to check the behavior of printf():

#include <stdio.h>

int main () {
    printf("%hhx\n", 0xffc0c1);
}

It should output c1 when run. If it doesn't, file a bug against Debian's glibc.

picca commented 6 months ago

Hi, thanks for the additional information. The failures from amd64 are both trying to start a PV Access server. The message Parse errors: No plan found in TAP output implies that the test programs could be crashing before they can even run the first statement in main(). There may be more information about the fault displayed earlier in the build output from when the tests were run, or even earlier when they were compiled or linked.

The full amd64 build log is here.

https://buildd.debian.org/status/fetch.php?pkg=epics-base&arch=amd64&ver=7.0.8%2Bdfsg1-1&stamp=1710186702&raw=0

if you find something interesting in here, do not hesitate to tell me.

The i386 had the same failures, plus the printf record issue above.

Nobody in the EPICS community has reported trying to build or test EPICS on those unsupported architectures. Until someone actually needs them and agrees to test and maintain EPICS on them we won't be willing to add the necessary files to our build system (it's not just EpicsHostArch.pl that would need updating for them to build).

ok, I will not investigate more.

If you have the ability to compile and run code on an i386 architecture Debian system, you could try compiling and running this program to check the behavior of printf():

#include <stdio.h>

int main () {
   printf("%hhx\n", 0xffc0c1);
}

It should output c1 when run. If it doesn't, file a bug against Debian's glibc.

I will try to find a way to test this on our porter box.

thanks

picca commented 6 months ago

I built it on another runner and it seems that it is ok now on amd64.

I think that the problem is related to a

ipv6/ipv4 vs ipv6 only host.

Did you tryed to run the test on an ipv6 only computer ?

Cheers

Fred

https://buildd.debian.org/status/fetch.php?pkg=epics-base&arch=amd64&ver=7.0.8%2Bdfsg1-1&stamp=1710327948&raw=0

ralphlange commented 6 months ago

To point out the obvious: You are aware of https://github.com/epicsdeb?

picca commented 6 months ago

You are aware of https://github.com/epicsdeb?

I do not have enough time personally to take care of all the epics packages.

But you can try to upload these packages into Debian via mentors.debian.net platform. We use salsa.debian.org in order to prepare these packages with the Debian CI pipeline :).

hope that it could help

Cheers

Frederic

ralphlange commented 6 months ago

Nah, I didn't want to suggest that.

I'm mostly referring to the epics-debhelper, which makes the packaging of the other modules (including base) a lot easier. Achieving compliance with the Debian packaging guidelines is hard.

picca commented 6 months ago

In that case maybe this should be the first package to prepare and upload into Debian.

What about discussing about this here

https://lists.debian.org/debian-mentors/

It is always great to have other DD opinions.

picca commented 6 months ago

Hello, I have extracted the network configuration of two tests

the first one is not ok

# udpSockTest()
ok  2 - epicsSocketCreate INET, DGRAM, 0
ok  3 - setsockopt BROADCAST := 1
ok  4 - getsockopt BROADCAST => 1
ok  5 - setsockopt BROADCAST := 0
ok  6 - getsockopt BROADCAST => 0
ok  7 - setsockopt MULTICAST_LOOP := 1
ok  8 - getsockopt MULTICAST_LOOP => 1
ok  9 - setsockopt MULTICAST_LOOP := 0
ok 10 - getsockopt MULTICAST_LOOP => 0
ok 11 - setsockopt IP_MULTICAST_TTL := 2
ok 12 - getsockopt IP_MULTICAST_TTL => 2
ok 13 - setsockopt IP_MULTICAST_TTL := 1
ok 14 - getsockopt IP_MULTICAST_TTL => 1
# udpSockFanoutBindTest()
# First test if epicsSocketEnableAddressUseForDatagramFanout() is necessary
ok 15 - bind() to port 53112
ok 16 - bind() to 53112 error -1, 98
# Now the real test
ok 17 - bind() to port 53112
ok 18 - bind() to port 53112
# udpSockFanoutTest()
not ok 19 - Found non-loopback interface # TODO Known failure on Debian buildd infra
not ok 20 - Successes 0 # TODO Known failure on Debian buildd infra
# tcpSockReuseBindTest(0)
ok 21 - bind() to port 56197
ok 22 - bind() to 56197 error -1, 98
# tcpSockReuseBindTest(1)
# epicsSocketEnableAddressReuseDuringTimeWaitState
ok 23 - bind() to port 39311
ok 24 - bind() to 39311 error -1, 98
ok`

The second one is ok

osiSockTest.t ................. 
1..24
ok  1 - osiSockAttach
# udpSockTest()
ok  2 - epicsSocketCreate INET, DGRAM, 0
ok  3 - setsockopt BROADCAST := 1
ok  4 - getsockopt BROADCAST => 1
ok  5 - setsockopt BROADCAST := 0
ok  6 - getsockopt BROADCAST => 0
ok  7 - setsockopt MULTICAST_LOOP := 1
ok  8 - getsockopt MULTICAST_LOOP => 1
ok  9 - setsockopt MULTICAST_LOOP := 0
ok 10 - getsockopt MULTICAST_LOOP => 0
ok 11 - setsockopt IP_MULTICAST_TTL := 2
ok 12 - getsockopt IP_MULTICAST_TTL => 2
ok 13 - setsockopt IP_MULTICAST_TTL := 1
ok 14 - getsockopt IP_MULTICAST_TTL => 1
# udpSockFanoutBindTest()
# First test if epicsSocketEnableAddressUseForDatagramFanout() is necessary
ok 15 - bind() to port 60648
ok 16 - bind() to 60648 error -1, 98
# Now the real test
ok 17 - bind() to port 60648
ok 18 - bind() to port 60648
# udpSockFanoutTest()
# Interface 209.87.16.255:5064
# Not LO
# RX1 start
# RX2 start
# recvfrom error (11)
# recvfrom error (11)
# RX1 end
# RX2 end
# Result: RX1 0:0 RX2 0:0
ok 19 - Found non-loopback interface # TODO Known failure on Debian buildd infra
not ok 20 - Successes 0 # TODO Known failure on Debian buildd infra
# tcpSockReuseBindTest(0)
ok 21 - bind() to port 47713
ok 22 - bind() to 47713 error -1, 98
# tcpSockReuseBindTest(1)
# epicsSocketEnableAddressReuseDuringTimeWaitState
ok 23 - bind() to port 37961
ok 24 - bind() to 37961 error -1, 98
ok

the difference between both is that

it did not found a loopback during the first build -> no network interface. (I suspect an ipv6 only build machine).

so my question is how can we solve this issue. If I remember correctly we had the same sort of issue with Tango...

https://gitlab.com/tango-controls/pytango/-/issues/450

thanks for considering

Frédéric

anjohnson commented 6 months ago

Core Developers: Please rename this issue to summarize the current failures you're seeing, or maybe close this issue and start another with those build. In general we don't support systems that don't have IPv4 because the CA network protocol doesn't support it yet (there is an implementation but we haven't merged it yet and might never do that).