bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.07k stars 182 forks source link

Testsuite crashes with "Bus error" on sparc64 due to bad alignment #835

Open glaubitz opened 1 month ago

glaubitz commented 1 month ago

The testsuite crashes on sparc64 with a "Bus error" which is usually an indicator for unaligned access:

diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /tmp/tmp.Zz6d99diuH
#Target sequences to report alignments for: 25
Opening the database... Bus error

This is most likely related to the discussion in https://github.com/bbuchfink/diamond/discussions/778 and I assume that fixing the crash on sparc64 will also fix the issue reported by @tamdao216.

glaubitz commented 1 month ago

Here is the backtrace:

#0  Deserializer::read<unsigned long> (this=<optimized out>, x=<optimized out>) at ./src/lib/../util/io/../algo/../system/endianness.h:20
20              return psnip_endian_le64(x);
(gdb) bt
#0  Deserializer::read<unsigned long> (this=<optimized out>, x=<optimized out>) at ./src/lib/../util/io/../algo/../system/endianness.h:20
#1  Deserializer::operator>> (this=0x10000637d88, x=<optimized out>) at ./src/lib/../util/io/deserializer.h:100
#2  operator>> (file=..., h=...) at ./src/data/dmnd/dmnd.cpp:91
#3  0x00000100003275b4 in DatabaseFile::read_header (stream=..., header=...) at ./src/data/dmnd/dmnd.cpp:195
#4  DatabaseFile::init (this=0x10000637bf0, flags=SequenceFile::Flags::NEED_LETTER_COUNT) at ./src/data/dmnd/dmnd.cpp:126
#5  0x000001000032a7b0 in DatabaseFile::DatabaseFile (this=0x10000637bf0, input_file="test/C.faa.diamond.dmnd", metadata=0, flags=SequenceFile::Flags::NEED_LETTER_COUNT, value_traits=...) at ./src/data/dmnd/dmnd.cpp:144
#6  0x000001000033b4e4 in SequenceFile::auto_create (path=std::vector of length 1, capacity 1 = {...}, flags=SequenceFile::Flags::NEED_LETTER_COUNT, metadata=0, value_traits=...) at ./src/data/sequence_file.cpp:393
#7  0x000001000012ae2c in Search::run (db=..., query=std::shared_ptr<SequenceFile> (empty) = {...}, out=..., db_filter=...) at ./src/run/double_indexed.cpp:796
#8  0x00000100000937a8 in main (ac=15, av=0x7feff9292c8) at ./src/run/main.cpp:116
glaubitz commented 1 month ago

It looks like the code is using custom functions to convert between big end little endianness.

I suggest switching to the endianness functions provided by the glibc.

glaubitz commented 1 month ago

OK, I replaced the custom byte-swap functions with the ones provided by glibc and the crash persists which means that the parameter itself is already misaligned:

#0  Deserializer::read<unsigned long> (this=<optimized out>, x=<optimized out>) at /usr/include/sparc64-linux-gnu/bits/byteswap.h:73
73        return __builtin_bswap64 (__bsx);
(gdb) bt
#0  Deserializer::read<unsigned long> (this=<optimized out>, x=<optimized out>) at /usr/include/sparc64-linux-gnu/bits/byteswap.h:73
#1  Deserializer::operator>> (this=0x10000637d88, x=<optimized out>) at ./src/lib/../util/io/deserializer.h:100
#2  operator>> (file=..., h=...) at ./src/data/dmnd/dmnd.cpp:91
#3  0x00000100003275b4 in DatabaseFile::read_header (stream=..., header=...) at ./src/data/dmnd/dmnd.cpp:195
#4  DatabaseFile::init (this=0x10000637bf0, flags=SequenceFile::Flags::NEED_LETTER_COUNT) at ./src/data/dmnd/dmnd.cpp:126
#5  0x000001000032a7b0 in DatabaseFile::DatabaseFile (this=0x10000637bf0, input_file="test/C.faa.diamond.dmnd", metadata=0, flags=SequenceFile::Flags::NEED_LETTER_COUNT, 
    value_traits=...) at ./src/data/dmnd/dmnd.cpp:144
#6  0x000001000033b4e4 in SequenceFile::auto_create (path=std::vector of length 1, capacity 1 = {...}, flags=SequenceFile::Flags::NEED_LETTER_COUNT, metadata=0, 
    value_traits=...) at ./src/data/sequence_file.cpp:393
#7  0x000001000012ae2c in Search::run (db=..., query=std::shared_ptr<SequenceFile> (empty) = {...}, out=..., db_filter=...) at ./src/run/double_indexed.cpp:796
#8  0x00000100000937a8 in main (ac=15, av=0x7feffe503e8) at ./src/run/main.cpp:116
glaubitz commented 1 month ago

I tried the following change in the hope that read() contains unaligned access, but the problem still persists:

--- diamond-aligner-2.1.9.orig/src/util/binary_buffer.h
+++ diamond-aligner-2.1.9/src/util/binary_buffer.h
@@ -41,7 +41,8 @@ struct BinaryBuffer : public std::vector
                void read(T &x)
                {
                        check(sizeof(T));
-                       x = *(T*)(&*ptr_);
+                       //x = *(T*)(&*ptr_);
+                       memcpy(&x, &*ptr_, sizeof(T));
                        ptr_ += sizeof(T);
                }
                template<typename T>

Currently out of ideas.

bbuchfink commented 1 month ago

I'll try to look into it but I don't have access to a sparc64 system so it may not be that easy.

glaubitz commented 1 month ago

On Oct 20, 2024, at 2:26 PM, Benjamin Buchfink @.***> wrote:

 I'll try to look into it but I don't have access to a sparc64 system so it may not be that easy.

There is a sparc64 machine running Solaris 11.4 in the GCC Compile Farm which can be accessed by anyone who requests an account for the GCC Compile Farm.

I can also help provide access to Linux on sparc64.

bbuchfink commented 1 week ago

Your idea was right but the problem also occurs in Deserializer::read. Plenty of other alignment-related issues in the code. I believe I tracked all of them down and will include this in the next release.

glaubitz commented 1 week ago

Your idea was right but the problem also occurs in Deserializer::read. Plenty of other alignment-related issues in the code. I believe I tracked all of them down and will include this in the next release.

Okay, thanks a lot for looking into this. Much appreciated!