Closed mihkelvaher closed 4 years ago
Using the release directly won't work because it doesn't include dependencies. This just isn't possible on github currently.
I recommend building from a specific commit instead via:
git clone --recursive --single-branch --branch v0.4.2 https://github.com/dnbaker/dashing
.
Would this work for you?
Regarding your question of memory consumption, in terms of RAM requirements, with n genomes each of 5 MB, sketch size p (log2 # of bytes), and t threads, Dashing with use approximately
• n(2*p) bytes (for sketches) • t (next_power_of_two(5e6)) bytes (for buffers of reading files in)
For reference (on all of RefSeq), our figure from the paper under distance used 100MB-1.4GB on sketches from p 10-14, so I'd guess that the answer is probably somewhere between 100MB and 500MB.
Didn't think of cloning a release branch! Thanks!
After a couple of days of trying, I can't get it compiled in a Singularity container (CentOS base), there's seems to be an issue with bonsai (the same issue occurs with making dashing).
git clone --recursive https://github.com/dnbaker/bonsai.git
cd bonsai/
make
gcc -Iclhash/include -I. -I.. -Ilibpopcnt -I.. -Iinclude -Icircularqueue -Izstd/zlibWrapper -Izstd/lib/common -Izstd/lib -Ihll/vec -Ihll -Ihll/include -Ipdqsort -Iinclude/bonsai -Iinclude -Ihll/vec/blaze -DNDEBUG -c klib/kthread.c -o klib/kthread.o -lz
gcc -Iclhash/include -I. -I.. -Ilibpopcnt -I.. -Iinclude -Icircularqueue -Izstd/zlibWrapper -Izstd/lib/common -Izstd/lib -Ihll/vec -Ihll -Ihll/include -Ipdqsort -Iinclude/bonsai -Iinclude -Ihll/vec/blaze -DNDEBUG -c klib/kstring.c -o klib/kstring.o -lz
ls clhash.o 2>/dev/null || mv clhash/clhash.o . 2>/dev/null || (cd clhash && git checkout master && make && cd .. && ln -s clhash/clhash.o .)
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
make[1]: Entering directory '/tmp/test/bonsai/clhash'
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -c ./src/clhash.c -Iinclude
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o unit ./tests/unit.c -Iinclude clhash.o
g++ -fPIC -std=c++11 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o cppunit ./tests/cppunit.cpp -Iinclude clhash.o
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o benchmark ./benchmarks/benchmark.c -Iinclude clhash.o
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o example example.c -Iinclude clhash.o
g++ -fPIC -std=c++11 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o cppexample cppexample.cpp -Iinclude clhash.o
make[1]: Leaving directory '/tmp/test/bonsai/clhash'
g++ -O3 -funroll-loops -pipe -fno-strict-aliasing -march=native -mpclmul -fopenmp -fno-rtti -std=c++14 -Wall -Wextra -Wno-char-subscripts -Wpointer-arith -Wwrite-strings -Wdisabled-optimization -Wformat -Wcast-align -Wno-unused-function -Wno-unused-parameter -pedantic -DUSE_PDQSORT -Wunused-variable -Wno-attributes -Wno-cast-align -Wno-gnu-zero-variadic-macro-arguments -Wno-ignored-attributes -Wno-missing-braces -DBONSAI_VERSION=\"v0.2.4\" -DNDEBUG -Iclhash/include -I. -I.. -Ilibpopcnt -I.. -Iinclude -Icircularqueue -Izstd/zlibWrapper -Izstd/lib/common -Izstd/lib -Ihll/vec -Ihll -Ihll/include -Ipdqsort -Iinclude/bonsai -Iinclude -Ihll/vec/blaze -L. clhash.o klib/kthread.o -DNDEBUG bin/fahist.cpp -o bin/fahist -lz
bin/fahist.cpp:15:12: fatal error: zlib.h: No such file or directory
# include <zlib.h>
^~~~~~~~
compilation terminated.
make: *** [Makefile:125: bin/fahist] Error 1
But the library itself exists:
head zlib/zlib.h
/* zlib.h -- interface of the 'zlib' general purpose compression library
version 1.2.11, January 15th, 2017
Copyright (C) 1995-2017 Jean-loup Gailly and Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
Making with the same commands works with no problems on OSX and on a Ubuntu virtualbox.
I asked the admin to compile and add dashing to our HPC.
While dashing is able to show the help, using dashing dist
, giving files results in an error
dashing dist testgzs/*
Dashing version: v0.4.2
Illegal instruction
The .gz files with the same command work on osx.
The same Illegal instruction
comes up also with releases dashing_s128 and dashing_s256 while using some sample fastas and dist.
dashing_s512 gives Illegal instruction
instead of help.
Does this Illegal instruction
mean there's a compiling error?
Or could it be some Debian/Red Hat issue?
Edit: creating a ubuntu container and running a release dashing from there results in the same error. BUT Suspecting it's something to do with the listed SSE2, AVX2, and AVX512BW, I checked /proc/cpuinfo which showed that sse2 is present.
I don't really understand that. I would expect it to work regardless based on the hardware available on the node you're compiling on or falling back to sse2. I test on CentOS personally and Travis checks Ubuntu, but I don't knowabout Debian/RedHat.
Sorry, I'm trying to catch a conference deadline and so I'm a bit slow to help this week.
Troubleshooting -- are you using the release/linux/*gz
binaries, not the release/osx/*gz
ones? I compiled those on CentOS.
I've finally managed to compile dashing the intended way and it seems to be working!
In the beginning, I tried to compile dashing in a Singularity container which resulted in the described bonsai issue. I'm doing all of the container building on my OSX because creating a container needs admin privileges.
After the comment "compiling on the node" I tried just to make dashing
but the HPC had an older version of gcc. Already using containers I installed a newer version of gcc into the container and tried to compile in it and through it but always got the bonsai issue.
Finally, I remembered that the cluster offers multiple versions of programs and loading gcc-9.1.0, compiling with it solved everything.
For my part, the issue can be closed, though it is a bit odd that compiling in a container fails.
That's strange. I wonder -- did you load zlib1g (or whatever the zlib package for your container) is?
Olga Botvinnik provided this Docker file a while back:
FROM ubuntu:16.04
MAINTAINER olga.botvinnik@czbiohub.org
WORKDIR /tmp
USER root
# Install basics
ENV PACKAGES git make ca-certificates zlib1g-dev build-essential curl wget cmake apt-utils
### don't modify things below here for version updates etc.
WORKDIR /home
RUN apt-get update && \
apt-get install -y --no-install-recommends ${PACKAGES} && \
apt-get clean
# Add add-apt-repository function
RUN apt-get update
RUN apt-get install -y software-properties-common
# Install gcc6 specifically
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update && apt-get install -y g++-6
RUN g++ --version
# Install
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-6
WORKDIR /
RUN git clone https://github.com/dnbaker/dashing/
WORKDIR /dashing
RUN pwd
RUN make update dashing
RUN cp /dashing/dashing /bin
# Test that getting help on dashing command works
RUN dashing -h
WORKDIR /
I haven't personally used Singularity, but I wonder if it might contain any pointers.
This indeed gave the needed hint (though some other problem occurred)
I made a rookie mistake thinking the problem was somewhere else other than with missing zlib, because 1) yum said that zlib was already installed 2) bonsai had a zlib. While zlib existed, the devel version didn't. yum -y install zlib-devel.x86_64
did the trick.
Dashing compiled and dist
shows the help, but unfortunately, the input files are not recognized:
Dashing version: v0.4.2
terminate called after throwing an instance of 'std::runtime_error'
what(): [bonsai/include/bonsai/encoder.h:void bns::Encoder<ScoreType>::for_each(const Functor&, const char*, kseq_t*) [with Functor = bns::dist_sketch_and_cmp(const std::vector<std::__cxx11::basic_string<char> >&, std::vector<sketch::hk::HeavyKeeper<6, 10, bns::SeededHash<sketch::hash::WangHash> > >&, bns::KSeqBufferHolder&, FILE*, FILE*, bns::Spacer, unsigned int, unsigned int, sketch::hll::EstimationMethod, sketch::hll::JointEstimationMethod, bool, bns::EmissionType, bns::EmissionFormat, bool, unsigned int, bool, std::__cxx11::string, std::__cxx11::string, bool, bool, std::__cxx11::string, std::size_t, bns::EncodingType) [with SketchType = sketch::hll::hllbase_t<>; FILE = _IO_FILE; std::__cxx11::string = std::__cxx11::basic_string<char>; std::size_t = long unsigned int]::<lambda(const char*)>::<lambda(bns::u64)>; ScoreType = bns::score::Lex]435] Could not open file at testfastas/131_Escherichia_coli_JJ1886_uid226103_NC_022648.fna. Abort!
Aborted (core dumped)
Same message with both .fna and .fna.gz.
Trying to be smarter this time, I created an Ubuntu container translating Dockerfile to Singularity file so no dependency wouldn't be left out. Same result : /
The good news is that Dashing works on the HPC. The problem was because it was initially compiled on another node with some other processors.
Great. Does it have permission to open that file? This error is thrown when it can't open a handle to the file.
Chmoding 777 all of the fastas, fasta dir and even the container image still results in the same error. The idea might have some merit because going into the container and creating some dummy fastas gives a result but overall I think it's not worth exploring further.
As Dashing needs to be compiled on the same machine, it'll be run, containers have lost their point for me because I can only build containers on my laptop and only then run them in the server (which gives the Illegal instruction
message).
Containers would be of help there's a problem with compiling (can't use a newer version of gcc). I just tried out this approach and it works. For anyone interested in the "CompilerContainer", here's the Singularity recipe:
Bootstrap: docker
From: centos
%post
yum -y groupinstall "Development Tools"
yum -y install git gcc-c++ zlib-devel.x86_64
# Uncomment this if you want to install dashing into the container
# mkdir -pv /usr/local/bin/build && cd /usr/local/bin/build && git clone --recursive --single-branch --branch v0.4.2 https://github.com/dnbaker/dashing && cd dashing && make dashing && mv -v dashing /usr/local/bin/
%environment
#nothing here currently
%runscript
echo "run specific command, nothing here"
Thanks for the help! The initial results look promising and there are a couple of questions but I'll create a separate issue for that.
Hi,
I'm planning to install dashing into a Singularity container (CentOS) but tried to install it on a server first (also CentOS).
The server has old gcc (4.8.5) but this is probably not the issue because making from cloned master breaks far later.
Unrelated: if no temporary files are created while creating the distance matrix, is everything held in memory? How large memory consumption is expected if running on thousands of assembled bacterial genomes (~5MB)? Asking for HPC resource allocation info.
Regards, Mihkel