Closed ctb closed 6 years ago
can you confirm that this bug also occurs on linux? I can't reproduce it.
continuing on the 'briefly' script:
wget https://github.com/GATB/bcalm/releases/download/v2.2.0/bcalm-binaries-v2.2.0-Linux.tar.gz
wget https://raw.githubusercontent.com/GATB/bcalm/master/scripts/convertToGFA.py
tar xf bcalm-binaries-v2.2.0-Linux.tar.gz
bcalm-binaries-v2.2.0-Linux/bin/bcalm -in twofoo.fq.gz -abundance-min 1
python convertToGFA.py twofoo.fq.unitigs.fa twofoo.fq.unitigs.gfa 31
python detect_non_reci_edges.py twofoo.fq.unitigs.gfa |wc -l
with detect_non_reci_edges.py
containing:
import sys
edges = set()
for line in open(sys.argv[1]):
if not line.startswith('L'): continue
line = line.split()
a,b = map(int,tuple((line[1],line[3])))
edges.add((a,b))
for a,b in edges:
if (b,a) not in edges:
print(a,b)
but on the unitigs file there https://osf.io/zgx3r/:
python convertToGFA.py bcalm.twofoo.k31.unitigs.fa bcalm.twofoo.k31.unitigs.gfa 31
python detect_non_reci_edges.py bcalm.twofoo.k31.unitigs.gfa |wc -l
returns 20889, indicating that I can confirm that there is an issue somewhere.
got it
btw reload this page, I edited the comment before
for a fresh build on our HPC of v2.2.0,
% python convertToGFA.py bcalm.twofoo.k31.unitigs.fa twofoo.fq.unitigs.gfa 31
GFA file open
done
% python detect_non_reci_edges.py twofoo.fq.unitigs.gfa | wc -l
13874
(I can't run the downloaded version of bcalm on my HPC for whatever reason - library incompat)
hi @rchikhi I do not see non-reciprocal edges on the downloaded version using binder so it seems like there is maybe a compile time issue that is biting me on both Linux and Mac OS X!?
Any suggestions on how to go about debugging? I'm not doing anything weird on my mac, and I'm using gcc 4.4.7 on the linux box...?
Linux jupyter-ctb-2d2017-2dbinder-2dtest-2da5xidg5l 4.4.111+ #1 SMP Thu Feb 1 22:06:37 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
Darwin Titus-MacBook-Air-3.local 16.7.0 Darwin Kernel Version 16.7.0: Tue Jan 30 11:27:06 PST 2018; root:xnu-3789.73.11~1/RELEASE_X86_64 x86_64 i386 MacBookPro13,2 Darwin
Linux dev-intel14-phi 2.6.32-696.20.1.el6.x86_64 #1 SMP Fri Jan 26 17:51:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Perhaps the next step is to compile it on the binder box. That requires some reconfig tho.
ok, I ended up compiling v2.2.0 on an AWS instance running Ubuntu. You'll be relieved to know that it works fine (i.e. all edges are reciprocal):
python3 detect.py twofoo.fq.unitigs.gfa | wc -l
0
so both my mac and my linux HPC have issues!? Any ideas or thoughts?
Linux ip-172-30-0-208 4.4.0-1052-aws #61-Ubuntu SMP Mon Feb 12 23:05:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
.hey, thanks for investigating. What are the compiler versions on those two machines that have issues?
Linux with problems:
gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)
on Mac OS X with problems:
% /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
(this is what cmake is using to build bcalm)
on the AWS instance, which works properly:
% gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.9' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
on the Linux with gcc 4.4.7, I'm not sure how you got it to run. I've just tried on a old CentOs box with the same gcc version, the pre-compiled binary complains about my libc version being too old (version GLIBC_2.14 not found
) and the source won't compile with that gcc version.
I tested on a similar linux as yours (Linux dx405-s06 2.6.32-696.23.1.el6.x86_64 #1 SMP Tue Mar 13 22:44:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
) that had gcc 4.4.7 by default, but I could only compile bcalm with a newer gcc (took 7.2 from conda). Still couldn't reproduce the bug
don't know what to tell you, I did build it natively on that version :)
any ideas for how to narrow down what might be going on? I am happy to perform analyses of unitig and k-mer content etc., which will be my next step by default.
I'm not sure where to find a OSX system, so let's focus on how you got that EL6 Linux with gcc4.4.7 to compile Bcalm. Can you please paste the output of the cmake step?
hmm, looks like I made a mistake reporting on the gcc version -- I just ran cmake again, and decided to rerun gcc -v:
% gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/software/GCC/6.2/libexec/gcc/x86_64-pc-linux-gnu/6.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.2.0/configure --disable-multilib --prefix=/opt/software/GCC/6.2
Thread model: posix
gcc version 6.2.0 (GCC)
apologies. But, in any case, here's cmake .
output.
cmake.txt
Some updates:
using sourmash, I calculated the Jaccard similarity of the two output files, one generated on my Mac (as above) and one generated on AWS (as above). They are identical in k-mer content. Hooray!
for those same files, I then redid the GFA and reciprocal analysis step via the scripts you provided. The Mac output version had 16795 non-reciprocal edges, while the AWS version had 0.
So whatever the problem is, it's in the FASTA headers rather than in the sequence content. Which seems quite good :)
I then dug into the output a bit, and saw the following difference:
aws (good):
Glue partition 128 has 4046 sequences
Glue partition 479 has 3830 sequences
Glue partition 37 has 3721 sequences
Glue partition 308 has 3667 sequences
Glue partition 434 has 3531 sequences
Glue partition 67 has 3455 sequences
Glue partition 60 has 3419 sequences
Glue partition 358 has 3396 sequences
Glue partition 250 has 3360 sequences
Glue partition 383 has 3350 sequences
Mac (bad):
Glue partition 10 has 11625 sequences
Glue partition 85 has 11545 sequences
Glue partition 110 has 11510 sequences
Glue partition 39 has 11363 sequences
Glue partition 74 has 11283 sequences
Glue partition 24 has 11253 sequences
Glue partition 55 has 11223 sequences
Glue partition 41 has 11221 sequences
Glue partition 96 has 11179 sequences
Glue partition 9 has 11162 sequences
Despite this, both unitig files have virtually same size (!?) - I'm not quite sure how that's possible given the apparent differences in FASTA headers...
I attach both sets of output.
I appreciate that you keep investigating. I finally got around an issue with running the compiled bcalm on CentOS6 with gcc 6.1 installed using conda. (It was /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.20 not found
and using
LD_PRELOAD=$HOME/miniconda2/envs/gcc-61/lib/libstdc++.so.6.0.24 ./bcalm
did the trick)... and.. guess what... still no edge problem. So, on the same Linux distro and almost the same GCC version (6.1 versus 6.2), still can't reproduce it.
The reason why unitigs have the same size and different glue step, could be due to the fact that Bcalm autodetects the the number of files that can be opened simultaneously on your system. To have deterministic execution, you can use -nb-cores 1 -nb-glue-partitions 100
.
A possible way to investigate further would be to compile bcalm in debug mode on OSX and run valgrind.
cd build
rm -Rf *
cmake -D CMAKE_BUILD_TYPE=Debug ..
make -j 8
valgrind ./bcalm -in twofoo.fq.gz -abundance-min 1 >/dev/null
To deepen the mystery, on my HPCC (same system as Titus) account with same gcc module:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/software/GCC/6.2/libexec/gcc/x86_64-pc-linux-gnu/6.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.2.0/configure --disable-multilib --prefix=/opt/software/GCC/6.2
Thread model: posix
gcc version 6.2.0 (GCC)
And the detection script from @rchikhi, I am unable to reproduce.
(space) welcherc@dev-intel16:~/src/spacegraphcats$ python detect_bad_edges.py twofoo/bcalm.twofoo.k31.unitigs.gfa | wc -l
0
And ldd -v
on the compiled binary for @ctb and me shows exact same ABI versions and paths for everything...
Aaahh but @ctb is on bcalm c8ac60252fa0b2abf511f7363cff7c4342dac2ee (Dec 20) and I'm on c41f70fe0ba78d0491fd65c7618994b3a1f86d7c (most recent, April 12), might be that...
Trying it out on the older commit also fails to reproduce, but double checking it appears I forgot to revert the gatb submodule, so for now I'm gonna say that's where to look. Off to see Infinity War, good luck ;)
Hmm.. what about with the released 2.2.0 version? https://github.com/GATB/bcalm/releases/download/v2.2.0/bcalm-binaries-v2.2.0-Linux.tar.gz
No joy; the glibc version is too old:
bcalm: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by bcalm)
(space) welcherc@dev-intel14:~/src/spacegraphcats$ /lib64/libc.so.6
GNU C Library stable release version 2.12, by Roland McGrath et al.
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-18).
Compiled on a Linux 2.6.32 system on 2017-06-20.
Available extensions:
The C stubs add-on version 2.1.2.
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
RT using linux kernel aio
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
Also just tried recompiling from @ctb's version of bcalm with the matching gatb version; still couldn't reproduce. I'm at a loss, seems like a gremlin to me.
on the same machine as him?!
BTW if you have any idea how to make the binary generated by Travis more compatible to old glibc's, I'd gladly take it :)
@rchikhi We're using https://github.com/phusion/holy-build-box for making statically linked SPAdes binaries. Though these days we're having custom docker image with gcc 5.3 inside (but still hbb-based).
thanks for the reminder about hbb Anton. We do use an old Centos to build Minia and GATB stuff, but I wanted to experiment with Travis for bcalm. Combinin hbb and Travis seems uncharted territory. Some people hacked their way through but looks like lot of hassle.
@camillescott and others, I've automated the whole thing here and provided the data set in the github repo; this is all by way of seeing if someone else can replicate the thing :)
https://github.com/ctb/2018-bcalm-debug
On my laptop with a build of v2.2.0 of bcalm:
% snakemake
...
RESULT: 812 non-reciprocal edges detected; should be 0
...
On AWS with a build of latest:
RESULT: 0 non-reciprocal edges detected; should be 0
etc.
for @camillescott there are instructions on how to execute it using my environment on HPC in the README.
OK, using the new automated workflow, and starting from a clean checkout ov v2.2.0, I no longer see the problem on any of my platforms --
rm -fr bcalm
git clone --recursive https://github.com/GATB/bcalm -b v2.2.0
cd bcalm
cmake .
make -j 8
sudo cp bcalm /usr/local/bin/bcalm
cd ../2018-bcalm-debug/
snakemake clean
snakemake all
works on Mac OS X, HPC, and AWS.
My current guess is that somehow I got stuck with a version mismatch with the gatb-core, and that starting from a fresh checkout of bcalm as above is the right solution there.
Thanks for bearing with, all!
Do you still have a version that had the bug? If so you could get maybe peace of mind by doing git branch
inside the gatb-core
folder and see if it was maybe detached. Anyhow, appreciate that it got resolved!
If I find it happening again, I will do so :)
in spacegraphcats, we are consuming the output of your excellent tool and doing (or attempting to do) Clever Graph Things with it. (thank you for bcalm!) however, we are running into a problem that in some cases there are nodes u, v where node u has an edge to v but node v does not have an edge to u.
more specifically, we see a number of examples like the following two headers in the unitigs output:
which show that 69342 is in the edge list of 206849, but 206849 is not in the edge list of 69342.
We have code to detect this here,
https://github.com/spacegraphcats/spacegraphcats/pull/190/commits/dc9a9cc8c2b07b27b2547341748f67b809e1025f
and I am happy to provide full execution instructions should you wish (should be: checkout that branch;
pip install -r requirements.txt
;make twofoo.fq.gz
;conf/run twofoo build
)your thoughts appreciated!
output from latest bcalm master is here: https://osf.io/zgx3r/ (generated with bcalm built from commit c41f70fe0ba on my mac os x laptop)
input files are generated from the rules here,
briefly
and then the command line execution is: