Open MarcusGDaniels opened 1 month ago
This test should support -d -v
options. If you rerun with that it may give us more details of the failure. (Same goes for all tests under t/*.t
)
Here is one:
expecting success:
flux archive create --mmap ./testfile &&
flux kvs get --raw archive.main | jq
./sharness.sh: line 338: 1641400 Segmentation fault (core dumped) flux archive create --mmap ./testfile
not ok 8 - map test file
expecting success:
From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 1:45 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
This test should support -d -v options. If you rerun with that it may give us more details of the failure. (Same goes for all tests under t/*.t)
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261424629 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPY5TMAH5DVC7LWHK2DZPFEFLAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGQZDINRSHE . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP3CCEBSDYMFAURBOL3ZPFEFLA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZKM7K.gif Message ID: @. @.> >
It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?
Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?
It is crashing when appending a zero to hex_encode from hashtostr from blobref_hash.
From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 3:30 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?
Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261565225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPZKEB42VDU4GHT7MZ3ZPFQN7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGU3DKMRSGU . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP4P4QE5BUMNV5AGVTDZPFQN7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZS7SS.gif Message ID: @. @.> >
I built libarchive-3.7.2.
From: Marcus Daniels Sent: Wednesday, July 31, 2024 3:33 PM To: flux-framework/flux-core @.>; flux-framework/flux-core @.> Cc: Author @.***> Subject: RE: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
It is crashing when appending a zero to hex_encode from hashtostr from blobref_hash.
From: Mark Grondona @. @.> > Sent: Wednesday, July 31, 2024 3:30 PM To: flux-framework/flux-core @. @.> > Cc: Marcus Daniels @. @.> >; Author @. @.> > Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?
Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261565225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPZKEB42VDU4GHT7MZ3ZPFQN7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGU3DKMRSGU . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP4P4QE5BUMNV5AGVTDZPFQN7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZS7SS.gif Message ID: @. @.> >
Unfortunately that's not quite enough to go on, and I can't reproduce on any el8 based system to which I have access. I wonder if I could reproduce your environment and get to the bottom of some of these issues. I think you said you're on rocky linux el8? Could you dump a list of installed RPMs somewhere (a gist or here) and then the set of packages you had to build by hand?
I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files. I added a prefix “local” to each of these symbols in the flux-core code base (and its tests) and this crash is avoided.
From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 4:15 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
Unfortunately that's not quite enough to go on, and I can't reproduce on any el8 based system to which I have access. I wonder if I could reproduce your environment and get to the bottom of some of these issues. I think you said you're on rocky linux el8? Could you dump a list of installed RPMs somewhere (a gist or here) and then the set of packages you had to build by hand?
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261641682 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP7KX7PKTLCGDAXCGJLZPFVVZAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGY2DCNRYGI . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP3LEWF4WY56BIDXYNDZPFVVZA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZXU5E.gif Message ID: @. @.> >
I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files.
I'm trying to understand what would cause this to happen. The symbols in libutil are linked as libtool convenience libraries, which should be static linking. We also have openssl-libs installed on our system, but have never seen the wrong symbols used.
If you run nm -D
on the flux
executable (in src/cmd/.libs
), do you see undefined SHA symbols? Maybe this is some kind of build error we haven't seen before.
I do see undefined symbols.
@.*** flux-core]$ nm ~/packages/flux/bin/flux | grep SHA1
U SHA1_Final@@OPENSSL_1_1_0
U SHA1_Init@@OPENSSL_1_1_0
U SHA1_Update@@OPENSSL_1_1_0
@.*** flux-core]$ nm -D ~/packages/flux/bin/flux | grep SHA1
U SHA1_Final
U SHA1_Init
U SHA1_Update
From: Mark Grondona @.> Sent: Thursday, August 1, 2024 7:55 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files.
I'm trying to understand what would cause this to happen. The symbols in libutil are linked as libtool convenience libraries, which should be static linking. We also have openssl-libs installed on our system, but have never seen the wrong symbols used.
If you run nm -D on the flux executable (in src/cmd/.libs), do you see undefined SHA symbols? Maybe this is some kind of build error we haven't seen before.
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2263281068 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP4MRQZACZXU2YGPVG3ZPJD3RAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGI4DCMBWHA . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP2EF2JZKW2VK2KPPD3ZPJD3RA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUG43W2Y.gif Message ID: @. @.> >
Ok, this is definitely some kind of build snafu we haven't seen before. Those symbols are from an internal convenience library and should be built statically into the flux
executable. Let me poke around a bit.
What is actually wrong with using the system libcrypto.so?
From: Mark Grondona @.> Sent: Thursday, August 1, 2024 9:39 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
Ok, this is definitely some kind of build snafu we haven't seen before. Those symbols are from an internal convenience library and should be built statically into the flux executable. Let me poke around a bit.
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2263497716 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP7CMGD2DTJTUKOJGNDZPJQCNAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGQ4TONZRGY . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP66DFJ5JUJNNJLCRSDZPJQCNA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUG5I57I.gif Message ID: @. @.> >
Flux doesn't use openssl/libcrypto directly, the SHA1 implementation was pulled in to avoid using the czmq zdigest implementation. In any event, still looking into the linking issue here.
If you have a chance, can you try:
$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1
$ readelf -s src/common/.libs/libflux-internal.a | grep SHA1
I cannot reproduce the missing SHA1 symbols on my el8 container or an el8-based real system. I wonder if the symbols are missing from either of the two convenience libraries they are supposed to be in.
@.*** flux-core]$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1
File: src/common/libutil/.libs/libutil.a(sha1.o)
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS sha1.c
22: 0000000000000000 4960 FUNC GLOBAL DEFAULT 1 SHA1_Transform
23: 0000000000001360 30 FUNC GLOBAL DEFAULT 1 SHA1_Init
24: 0000000000001380 369 FUNC GLOBAL DEFAULT 1 SHA1_Update
27: 0000000000001500 272 FUNC GLOBAL DEFAULT 1 SHA1_Final
10: 0000000000000100 101 FUNC LOCAL DEFAULT 1 sha1_hash
36: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Init
37: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Update
38: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Final
@.*** flux-core]$ readelf -s src/common/.libs/libflux-internal.a | grep SHA1
36: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Init
37: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Update
38: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND SHA1_Final
22: 0000000000000000 4960 FUNC GLOBAL DEFAULT 1 SHA1_Transform
23: 0000000000001360 30 FUNC GLOBAL DEFAULT 1 SHA1_Init
24: 0000000000001380 369 FUNC GLOBAL DEFAULT 1 SHA1_Update
27: 0000000000001500 272 FUNC GLOBAL DEFAULT 1 SHA1_Final
@.*** flux-core]$
From: Mark Grondona @.> Sent: Friday, August 2, 2024 7:11 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)
If you have a chance, can you try:
$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1 $ readelf -s src/common/.libs/libflux-internal.a | grep SHA1
I cannot reproduce the missing SHA1 symbols on my el8 container or an el8-based real system. I wonder if the symbols are missing from either of the two convenience libraries they are supposed to be in.
— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2265485107 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPY67HCWUOB7KPVRKT3ZPOHO7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGQ4DKMJQG4 . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP2RYRFMMZLJZ3WEIHLZPOHO7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUHBCHTG.gif Message ID: @. @.> >
Thanks. So the symbols are in the convenience libraries as expected, but are not included by the linker in the final executable. I'm confused :confounded: