flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 49 forks source link

flux archive segfault due to improper linkage to libcrypto.so SHA1_Init() et al #6164

Open MarcusGDaniels opened 1 month ago

MarcusGDaniels commented 1 month ago
[4553vn@hpcvdi108 t]$ ./t0021-archive-cmd.t 
ok 1 - flux archive create --badopt prints unrecognized option
ok 2 - flux archive create with no PATHs fails
ok 3 - flux archive create with bad FLUX_URI fails
ok 4 - flux archive create fails with --overwrite --append
ok 5 - flux archive create fails with -C baddir bad 
ok 6 - flux archive create fails with FIFO in input
ok 7 - flux archive remove fails if archive doesnt exist
ok 8 - but it works with -f
ok 9 - flux archive create works (small file)
ok 10 - archive.main contains a base64-encoded file
ok 11 - flux archive create fails if archive exists
not ok 12 - flux archive create --overwrite works (large file)
#   
#       randbytes 2048 >testfile2 &&
#       flux archive create --overwrite --preserve \
#           --small-file-threshold=1K -v testfile2 &&
#       flux kvs get archive.main >archive2.out
#   
not ok 13 - and archive.main contains a blobvec-encoded file
#   
#       jq -e -r <archive2.out ".[0].encoding == \"blobvec\""
grondo commented 1 month ago

This test should support -d -v options. If you rerun with that it may give us more details of the failure. (Same goes for all tests under t/*.t)

MarcusGDaniels commented 1 month ago

Here is one:

expecting success:

             flux archive create --mmap ./testfile &&

             flux kvs get --raw archive.main | jq

./sharness.sh: line 338: 1641400 Segmentation fault (core dumped) flux archive create --mmap ./testfile

not ok 8 - map test file

flux archive create --mmap ./testfile &&

flux kvs get --raw archive.main | jq

expecting success:

From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 1:45 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

This test should support -d -v options. If you rerun with that it may give us more details of the failure. (Same goes for all tests under t/*.t)

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261424629 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPY5TMAH5DVC7LWHK2DZPFEFLAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGQZDINRSHE . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP3CCEBSDYMFAURBOL3ZPFEFLA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZKM7K.gif Message ID: @. @.> >

grondo commented 1 month ago

It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?

Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?

MarcusGDaniels commented 1 month ago

It is crashing when appending a zero to hex_encode from hashtostr from blobref_hash.

From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 3:30 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?

Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261565225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPZKEB42VDU4GHT7MZ3ZPFQN7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGU3DKMRSGU . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP4P4QE5BUMNV5AGVTDZPFQN7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZS7SS.gif Message ID: @. @.> >

MarcusGDaniels commented 1 month ago

I built libarchive-3.7.2.

From: Marcus Daniels Sent: Wednesday, July 31, 2024 3:33 PM To: flux-framework/flux-core @.>; flux-framework/flux-core @.> Cc: Author @.***> Subject: RE: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

It is crashing when appending a zero to hex_encode from hashtostr from blobref_hash.

From: Mark Grondona @. @.> > Sent: Wednesday, July 31, 2024 3:30 PM To: flux-framework/flux-core @. @.> > Cc: Marcus Daniels @. @.> >; Author @. @.> > Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

It appears a corefile was produced. If so, would you be able (and willing) to provide a backtrace?

Was libarchive one of the packages you had to build yourself and side-install? If so, what version did you build, and was it the same as the version installed on the system (if there is one)?

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261565225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPZKEB42VDU4GHT7MZ3ZPFQN7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGU3DKMRSGU . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP4P4QE5BUMNV5AGVTDZPFQN7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZS7SS.gif Message ID: @. @.> >

grondo commented 1 month ago

Unfortunately that's not quite enough to go on, and I can't reproduce on any el8 based system to which I have access. I wonder if I could reproduce your environment and get to the bottom of some of these issues. I think you said you're on rocky linux el8? Could you dump a list of installed RPMs somewhere (a gist or here) and then the set of packages you had to build by hand?

MarcusGDaniels commented 1 month ago

I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files. I added a prefix “local” to each of these symbols in the flux-core code base (and its tests) and this crash is avoided.

From: Mark Grondona @.> Sent: Wednesday, July 31, 2024 4:15 PM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

Unfortunately that's not quite enough to go on, and I can't reproduce on any el8 based system to which I have access. I wonder if I could reproduce your environment and get to the bottom of some of these issues. I think you said you're on rocky linux el8? Could you dump a list of installed RPMs somewhere (a gist or here) and then the set of packages you had to build by hand?

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2261641682 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP7KX7PKTLCGDAXCGJLZPFVVZAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGY2DCNRYGI . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP3LEWF4WY56BIDXYNDZPFVVZA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZXU5E.gif Message ID: @. @.> >

grondo commented 1 month ago

I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files.

I'm trying to understand what would cause this to happen. The symbols in libutil are linked as libtool convenience libraries, which should be static linking. We also have openssl-libs installed on our system, but have never seen the wrong symbols used.

If you run nm -D on the flux executable (in src/cmd/.libs), do you see undefined SHA symbols? Maybe this is some kind of build error we haven't seen before.

MarcusGDaniels commented 1 month ago

I do see undefined symbols.

@.*** flux-core]$ nm ~/packages/flux/bin/flux | grep SHA1

             U SHA1_Final@@OPENSSL_1_1_0

             U SHA1_Init@@OPENSSL_1_1_0

             U SHA1_Update@@OPENSSL_1_1_0

@.*** flux-core]$ nm -D ~/packages/flux/bin/flux | grep SHA1

             U SHA1_Final

             U SHA1_Init

             U SHA1_Update

From: Mark Grondona @.> Sent: Thursday, August 1, 2024 7:55 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

I tracked it down to SHA1_{Init,Update,Final} being resolved from the system’s libcrypto.so instead of the Flux-included files.

I'm trying to understand what would cause this to happen. The symbols in libutil are linked as libtool convenience libraries, which should be static linking. We also have openssl-libs installed on our system, but have never seen the wrong symbols used.

If you run nm -D on the flux executable (in src/cmd/.libs), do you see undefined SHA symbols? Maybe this is some kind of build error we haven't seen before.

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2263281068 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP4MRQZACZXU2YGPVG3ZPJD3RAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGI4DCMBWHA . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP2EF2JZKW2VK2KPPD3ZPJD3RA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUG43W2Y.gif Message ID: @. @.> >

grondo commented 1 month ago

Ok, this is definitely some kind of build snafu we haven't seen before. Those symbols are from an internal convenience library and should be built statically into the flux executable. Let me poke around a bit.

MarcusGDaniels commented 1 month ago

What is actually wrong with using the system libcrypto.so?

From: Mark Grondona @.> Sent: Thursday, August 1, 2024 9:39 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

Ok, this is definitely some kind of build snafu we haven't seen before. Those symbols are from an internal convenience library and should be built statically into the flux executable. Let me poke around a bit.

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2263497716 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVP7CMGD2DTJTUKOJGNDZPJQCNAVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGQ4TONZRGY . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP66DFJ5JUJNNJLCRSDZPJQCNA5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUG5I57I.gif Message ID: @. @.> >

grondo commented 1 month ago

Flux doesn't use openssl/libcrypto directly, the SHA1 implementation was pulled in to avoid using the czmq zdigest implementation. In any event, still looking into the linking issue here.

grondo commented 1 month ago

If you have a chance, can you try:

$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1
$ readelf -s src/common/.libs/libflux-internal.a | grep SHA1

I cannot reproduce the missing SHA1 symbols on my el8 container or an el8-based real system. I wonder if the symbols are missing from either of the two convenience libraries they are supposed to be in.

MarcusGDaniels commented 1 month ago

@.*** flux-core]$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1

File: src/common/libutil/.libs/libutil.a(sha1.o)

 1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS sha1.c

22: 0000000000000000  4960 FUNC    GLOBAL DEFAULT    1 SHA1_Transform

23: 0000000000001360    30 FUNC    GLOBAL DEFAULT    1 SHA1_Init

24: 0000000000001380   369 FUNC    GLOBAL DEFAULT    1 SHA1_Update

27: 0000000000001500   272 FUNC    GLOBAL DEFAULT    1 SHA1_Final

10: 0000000000000100   101 FUNC    LOCAL  DEFAULT    1 sha1_hash

36: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Init

37: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Update

38: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Final

@.*** flux-core]$ readelf -s src/common/.libs/libflux-internal.a | grep SHA1

36: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Init

37: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Update

38: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND SHA1_Final

22: 0000000000000000  4960 FUNC    GLOBAL DEFAULT    1 SHA1_Transform

23: 0000000000001360    30 FUNC    GLOBAL DEFAULT    1 SHA1_Init

24: 0000000000001380   369 FUNC    GLOBAL DEFAULT    1 SHA1_Update

27: 0000000000001500   272 FUNC    GLOBAL DEFAULT    1 SHA1_Final

@.*** flux-core]$

From: Mark Grondona @.> Sent: Friday, August 2, 2024 7:11 AM To: flux-framework/flux-core @.> Cc: Marcus Daniels @.>; Author @.> Subject: Re: [flux-framework/flux-core] explanation for these test failures (likely package dependencies?) (Issue #6164)

If you have a chance, can you try:

$ readelf -s src/common/libutil/.libs/libutil.a | grep -i SHA1 $ readelf -s src/common/.libs/libflux-internal.a | grep SHA1

I cannot reproduce the missing SHA1 symbols on my el8 container or an el8-based real system. I wonder if the symbols are missing from either of the two convenience libraries they are supposed to be in.

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/6164#issuecomment-2265485107 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVVPY67HCWUOB7KPVRKT3ZPOHO7AVCNFSM6AAAAABLZG35HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGQ4DKMJQG4 . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AACVVP2RYRFMMZLJZ3WEIHLZPOHO7A5CNFSM6AAAAABLZG35HKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUHBCHTG.gif Message ID: @. @.> >

grondo commented 1 month ago

Thanks. So the symbols are in the convenience libraries as expected, but are not included by the linker in the final executable. I'm confused :confounded: