bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.04k stars 182 forks source link

Build failure on several architectures #348

Open tillea opened 4 years ago

tillea commented 4 years ago

Hi, I was hoping that #347 would have been solved with the release of diamond version 0.9.32. However, now nearly all architectures are even have build failures ( example for i386 ) The error message in the build log is:

In file included from /<<PKGBUILDDIR>>/src/search/stage2.cpp:31:
/<<PKGBUILDDIR>>/src/search/left_most.h: In function ‘bool Search::left_most_filter(const sequence&, const Letter*, int, int, const Search::Context&, bool)’:
/<<PKGBUILDDIR>>/src/search/left_most.h:35:22: error: ‘seed_mask’ was not declared in this scope; did you mean ‘fd_mask’?
   35 |   query_seed_mask = ~seed_mask(q, window);
      |                      ^~~~~~~~~
      |                      fd_mask

Kind regards, Andreas.

bbuchfink commented 4 years ago

Could you try again using f5daf92d3ece50ebb0c0a5f9b4133769347ec55b?

From what I see in the log, there's also still the missing pthreads problem. I did try to compile using the new GCC 9.3.0 release which worked fine for me, so I still don't know how to reproduce that.

tillea commented 4 years ago

Hi Benjamin, thanks a lot for your quick reply.

On Wed, Apr 29, 2020 at 09:17:14AM -0700, Benjamin Buchfink wrote:

Could you try again using f5daf92d3ece50ebb0c0a5f9b4133769347ec55b?

I get now

/usr/bin/c++  -DLEFTMOST_SEED_FILTER -DMAX_SHAPE_LEN=17 -DSEQ_MASK -DSTRICT_BAND -I/build/diamond-aligner-0.9.32/src -I/build/diamond-aligner-0.9.32/src/lib  -g -O2 -fdebug-prefix-map=/b
/build/diamond-aligner-0.9.32/src/search/stage2.cpp: In function ‘void Search::ARCH_GENERIC::stage2(const Packed_loc*, const Packed_loc*, const std::vector<Stage1_hit>&, Statistics&, Asy
/build/diamond-aligner-0.9.32/src/search/stage2.cpp:193:88: error: too many arguments to function ‘void Search::ARCH_GENERIC::search_query_offset(Loc, const Packed_loc*, std::vector<Stag
  193 |    search_query_offset(q[i.begin()->q], s, i.begin(), i.end(), stats, out, sid, context);
      |                                                                                        ^
/build/diamond-aligner-0.9.32/src/search/stage2.cpp:132:6: note: declared here
  132 | void search_query_offset(Loc q,
      |      ^~~~~~~~~~~~~~~~~~~

From what I see in the log, there's also still the missing pthreads problem. I did try to compile using the new GCC 9.3.0 release which worked fine for me, so I still don't know how to reproduce that.

I'll ask for help meanwhile. Kind regards, Andreas.

tillea commented 4 years ago

On Wed, Apr 29, 2020 at 09:17:14AM -0700, Benjamin Buchfink wrote:

From what I see in the log, there's also still the missing pthreads problem. I did try to compile using the new GCC 9.3.0 release which worked fine for me, so I still don't know how to reproduce that.

In the log I read

-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Configuring done

So what exactly is your problem if "Found Threads: TRUE"?

bbuchfink commented 4 years ago

Ok, I guess this is settled then. Will take care of the compiler errors.

bbuchfink commented 4 years ago

I hope this compiles now: 63b3e686f41272851a87c632441ba8c6484a2fe2

tillea commented 4 years ago

Hi again, I've excluded 32bit architectures now from build but there are test failures for big endian architectures (s390x/ppc64/sparc64) now, which you can see in the ppc64 build log. Do you intend to track this down or do you think I should exclude these architectures as well? Kind regards, Andreas.

bbuchfink commented 4 years ago

Hi Andreas, I'll try to track down the errors, but I may need a couple of days for this.

bbuchfink commented 4 years ago

Status: I'm in the process of acquiring access to an IBM POWER system so I can track down this error.

bbuchfink commented 4 years ago

I fixed a big-endian related bug and added the other architectures to the travic ci including unit test. https://travis-ci.org/github/bbuchfink/diamond/builds/687760195

Clang on powerpc fails due to some problem with altivec and the s390x builds fail the unit tests due to reasons unknown, other than that things seem to work.

tillea commented 4 years ago

On Sat, May 16, 2020 at 06:28:57AM -0700, Benjamin Buchfink wrote:

I fixed a big-endian related bug and added the other architectures to the travic ci including unit test. https://travis-ci.org/github/bbuchfink/diamond/builds/687760195

Clang on powerpc fails due to some problem with altivec and the s390x builds fail the unit tests due to reasons unknown, other than that things seem to work. May be you try a new release with this status and we ignore the two failing architectures? Kind regards, Andreas.

bbuchfink commented 4 years ago

I fixed the remaining powerpc issue. Not sure why the s390x tests are failing, the output appears to be correct as far as I can tell. Could you tell me if things work on your end before I post a new release?

tillea commented 4 years ago

On Sat, May 16, 2020 at 09:19:45AM -0700, Benjamin Buchfink wrote:

I fixed the remaining powerpc issue. Not sure why the s390x tests are failing, the output appears to be correct as far as I can tell. Could you tell me if things work on your end before I post a new release? Feel free to do a new release and I'll upload to the autobuilders. Kind regards, Andreas.

AdrianBunk commented 4 years ago

I fixed the remaining powerpc issue. Not sure why the s390x tests are failing, the output appears to be correct as far as I can tell. Could you tell me if things work on your end before I post a new release?

Doesn't seem to work, with the code at commit 7ad411fb on s390x:

$ ../../obj-s390x-linux-gnu/diamond blastp --threads 1 --db 'test/C.faa.diamond' --query 'test/E.faa' --out E.faa.vs.C.faa.diamond -e 1e-05 --outfmt 6  --sensitive
diamond v0.9.32.133 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
Opening the database... Database file is not a DIAMOND database, treating as FASTA.
Database input file: test/C.faa.diamond.dmnd
Opening the database file...  [0s]
Loading sequences... Warning: Failed to delete file diamond-tmp-tu4lhy
 [0s]
 [0s]
Error: Error reading input stream at line 1: FASTA format error: Missing '>' at record start.
$
bbuchfink commented 4 years ago

Could you try this again with the latest commit?

AdrianBunk commented 4 years ago

This does not help, but i debugged this a bit further.

In src/data/reference.cpp I can fix the format detection by using le64toh at the two places where magic_number is used, like

magic_number == le64toh(ReferenceHeader::MAGIC_NUMBER)

This gets me to

Error: Database was built with a newer version of Diamond and is incompatible.

Likely a similar problem, I assume you know best where to fix endian conversion on file reading.

bbuchfink commented 4 years ago

I see, you are using a database file created on a little-endian system. I'll try to get the formats compatible.

bbuchfink commented 4 years ago

I added endianness conversion to everything now except for some of the taxonomy features.

tillea commented 4 years ago

On Tue, May 19, 2020 at 03:57:51AM -0700, Benjamin Buchfink wrote:

I added endianness conversion to everything now except for some of the taxonomy features. Cool. Thanks a lot for your effort. Please ping me once you do a new release with these features. Kind regards, Andreas.

bbuchfink commented 4 years ago

@tillea The latest release now should have endianness conversion for everything, and should work at least according to the travis ci: https://travis-ci.org/github/bbuchfink/diamond/builds/693404907

tillea commented 4 years ago

Hi Benjamin, On Mon, Jun 01, 2020 at 06:55:35AM -0700, Benjamin Buchfink wrote:

@tillea The latest release now should have endianness conversion for everything, and should work at least according to the travis ci: https://travis-ci.org/github/bbuchfink/diamond/builds/693404907 Thanks a lot for keeping me up to date. I've uploaded the latest version. Unfortunately there are some remaining issues on some architectures as you can see here: https://buildd.debian.org/status/package.php?p=diamond-aligner (hope that link is sufficient - please let me know if you need more information which I would happily provide) Kind regards, Andreas.

bbuchfink commented 4 years ago

Hi Andreas, unfortunately there is no error message from Diamond in these logs, only this:

+ diamond blastp --threads 1 --db test/C.faa.diamond --query test/E.faa --out /tmp/tmp.dUhbhJeGjW/E.faa.vs.C.faa.diamond -e 1e-05 --outfmt 6 --quiet --sensitive
make[1]: *** [debian/rules:33: override_dh_auto_test] Error 1

Not sure why, even when using --quiet Diamond should write error messages to stderr.

It still seems to fail on big endian systems, but since I don't have access to such a system and things seem to work on the travis ci I'm not quite sure how to address this. Do you have any suggestions?

tillea commented 4 years ago

Hi Benjamin,

On Wed, Jun 03, 2020 at 08:50:42AM -0700, Benjamin Buchfink wrote:

+ diamond blastp --threads 1 --db test/C.faa.diamond --query test/E.faa --out /tmp/tmp.dUhbhJeGjW/E.faa.vs.C.faa.diamond -e 1e-05 --outfmt 6 --quiet --sensitive
make[1]: *** [debian/rules:33: override_dh_auto_test] Error 1

Not sure why, even when using --quiet Diamond should write error messages to stderr.

For the moment I have removed the --quiet option in Git (not uploaded yet).

It still seems to fail on big endian systems, but since I don't have access to such a system and things seem to work on the travis ci I'm not quite sure how to address this. Do you have any suggestions?

I try to ask s390x porters for help. I have no better idea for the moment.

Kind regards

  Andreas.
bbuchfink commented 4 years ago

Hi @tillea , I was able to track down the issue using the travis ci. Will post a new release in the next days.

tillea commented 4 years ago

On Mon, Jun 15, 2020 at 01:17:52AM -0700, Benjamin Buchfink wrote:

Hi @tillea , I was able to track down the issue using the travis ci. Will post a new release in the next days. Thanks a lot for keeping me updated - I'm trying to upload quickly if you ping me about the new release. Andreas.

bbuchfink commented 4 years ago

Hi @tillea my new attempt: https://github.com/bbuchfink/diamond/releases/tag/v0.9.35

tillea commented 4 years ago

Hi Benjamin, thanks for pinging me. On Sat, Jun 20, 2020 at 04:58:24AM -0700, Benjamin Buchfink wrote:

Hi @tillea my new attempt: https://github.com/bbuchfink/diamond/releases/tag/v0.9.35 https://github.com/bbuchfink/diamond/issues/348#issuecomment-646985535 I do not remember what architecture remained problematic but mips64el does not build due to test suite issues:

https://buildd.debian.org/status/fetch.php?pkg=diamond-aligner&arch=mips64el&ver=0.9.35-1&stamp=1592665903&raw=0

There also non-release architectures - see here the full matrix:

https://buildd.debian.org/status/package.php?p=diamond-aligner

Kind regards, Andreas.

bbuchfink commented 4 years ago

Last time the big endian architectures failed, which are working now, so its not related to the endianess any more. I'll see if I can do something about the mips64el error.

AdrianBunk commented 4 years ago

Details for the mips64el error:

...
Processing query block 0, reference block 0, shape 15, index chunk 0.
Building reference seed array...  [0.006s]
Building query seed array...  [0.007s]
Computing hash join...  [0.001s]
Building seed filter...  [0.015s]
Searching alignments...  [0s]
Processing query block 0, reference block 0, shape 15, index chunk 1.
Building reference seed array...  [0.006s]
Building query seed array...  [0.007s]
Computing hash join...  [0.001s]
Building seed filter...  [0.015s]
Searching alignments...  [0s]
Processing query block 0, reference block 0, shape 15, index chunk 2.
Building reference seed array...  [0.006s]
Building query seed array...  [0.007s]
Computing hash join...  [0.001s]
Building seed filter...  [0.015s]
Searching alignments...  [0s]
Processing query block 0, reference block 0, shape 15, index chunk 3.
Building reference seed array...  [0.006s]
Building query seed array...  [0.007s]
Computing hash join...  [0.001s]
Building seed filter...  [0.015s]
Searching alignments...  [0s]
Deallocating buffers...  [0s]
Computing alignments... Segmentation fault (core dumped)

Backtrace:

#0  parse_lsda_header (context=0xffeaa7c140, 
    p=0x233558 <error: Cannot access memory at address 0x233558>, 
    info=0xffeaa7b410)
    at ../../../../src/libstdc++-v3/libsupc++/eh_personality.cc:58
#1  0x000000fff6e383dc in __cxxabiv1::__gxx_personality_v0 (
    version=<optimized out>, actions=<optimized out>, 
    exception_class=<optimized out>, ue_header=0xffe4080f00, 
    context=0xffeaa7c140)
    at ../../../../src/libstdc++-v3/libsupc++/eh_personality.cc:454
#2  0x000000fff6ca33cc in _Unwind_RaiseException ()
   from /lib/mips64el-linux-gnuabi64/libgcc_s.so.1
#3  0x000000fff6e395e4 in __cxxabiv1::__cxa_throw (obj=0xffe4080f20, 
    tinfo=0xaaab90cfa8 <typeinfo for EndOfStream>, dest=<optimized out>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:90
#4  0x000000aaab74e918 in Deserializer::read<unsigned int> (x=<optimized out>, 
    this=<optimized out>) at ./src/data/../util/io/exceptions.h:55
#5  read_varint<Deserializer> (buf=..., dst=@0xffeaa7d0cc: 2)
    at ./src/data/../util/io/../algo/varint.h:78
#6  0x000000aaab7acd70 in Deserializer::operator>> (x=@0xffeaa7d0cc: 2, 
    this=0xffeaa7d100) at ./src/align/../search/../util/io/deserializer.h:56
#7  hit::read<std::back_insert_iterator<std::vector<hit, std::allocator<hit> > > > (it=..., s=...) at ./src/align/../search/trace_pt_buffer.h:123
#8  Async_buffer<hit>::load_bin (this=0xaab2f3f7b0, 
    out=std::vector of length 83, capacity 780 = {...}, bin=<optimized out>)
    at ./src/align/../search/../util/async_buffer.h:162
#9  0x000000aaab7ad43c in Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}::operator()(unsigned long) const (this=<optimized out>, end=16)
    at ./src/align/../search/../util/async_buffer.h:115
#10 std::__invoke_impl<void, Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_other, Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}&&, unsigned long&&) (__f=...)
    at /usr/include/c++/9/bits/invoke.h:60
#11 std::__invoke<Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_result&&, (Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}&&)...) (__fn=...)
    at /usr/include/c++/9/bits/invoke.h:95
#12 std::thread::_Invoker<std::tuple<Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0xaab2f32b98) at /usr/include/c++/9/thread:244
#13 std::thread::_Invoker<std::tuple<Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}, unsigned long> >::operator()() (this=0xaab2f32b98)
    at /usr/include/c++/9/thread:251
#14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<Async_buffer<hit>::load(unsigned long)::{lambda(unsigned long)#1}, unsigned long> > >::_M_run() (
    this=0xaab2f32b90) at /usr/include/c++/9/thread:195
#15 0x000000fff6e70a6c in std::execute_native_thread_routine (__p=0xaab2f32b90)
    at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:80
#16 0x000000fff6fcb6ec in start_thread ()
   from /lib/mips64el-linux-gnuabi64/libpthread.so.0
bbuchfink commented 4 years ago

Thanks for posting the backtrace, but unfortunately it's not clear to me why this crash happens. It does not occur on other platforms, including when compiled with an address sanitizer. I have tried getting Debian mips64el to run using qemu but failed so far. There's a number of developer machines listed at https://wiki.debian.org/MIPSPort#Installation, one of which is supposed to have "public" access (eller). So I was wondering if it would be possible to get access there.

tillea commented 4 years ago

Hi Benjamin,

On Fri, Jun 26, 2020 at 06:50:03AM -0700, Benjamin Buchfink wrote:

Thanks for posting the backtrace, but unfortunately it's not clear to me why this crash happens. It does not occur on other platforms, including when compiled with an address sanitizer. I have tried getting Debian mips64el to run using qemu but failed so far. There's a number of developer machines listed at https://wiki.debian.org/MIPSPort#Installation, one of which is supposed to have "public" access (eller). So I was wondering if it would be possible to get access there.

As far as I know (but I can be wrong) "public" means "public for Debian Developers" (like me) and not only for some specific developers. So what I could try is to login on that box and try some debugging (calling diamond under gdb or so). I'm not sure whether I manage to do this before Monday. Any reader of Debian Mips porters list who beats me in doing so is perfectly welcome. Kind regards, Andreas.

bbuchfink commented 4 years ago

Thanks, I appreciate anything you can do.

mr-c commented 3 years ago

Indeed, https://db.debian.org/machines.cgi?host=eller is available for mips64el porting work

@bbuchfink As per https://dsa.debian.org/doc/guest-account/ I can sponsor you for access

If you are still interested, then please send me via crusoe @ debian.org the following information:

If you think you might need future access to a real mips64el machine, or other architecture, then I would recommend applying to be an official contributor to Debian in the role of at least "Debian Maintainer"

Personally, I don't prioritize mips64el for bioinformatics tools, but supporting other architectures is laudable (and can find interesting bugs) so I'm happy to support you in this.