GATB / bcalm

compacted de Bruijn graph construction in low memory
MIT License
99 stars 20 forks source link

bioconda install of bcalm produces incorrect cDBG on linux? #37

Closed ctb closed 5 years ago

ctb commented 5 years ago

In #34, I reported that in not-very-reproducible circumstances on my HPC, when I compiled bcalm from source, it produced bad cDBG files. I ended up automating the heck out of the check and being unable to reproduce it so I gave up. But I cleverly never removed the validation code from the spacegraphcats application :)

Fast forward to today, and I found the same problem as in #34 when I installed bcalm using bioconda! Hopefully this time it will be more reproducible. I reproduce the bug on both my HPC and Amazon's Linux box, which kind of makes sense because bioconda provides a binary install so I would hope that it would fail the same way on both :).

To reproduce using spacegraphcats

this is running on Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-01bbe152bf19d0289, region us-west-2. I created a t2-large with 20 GB of disk space, logged in as ec2-user, and then ran the following commands (some require answering interactive prompts, sorry):

# install C compiler
sudo yum groupinstall "Development Tools"

# install conda
curl -L -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh 
source ~/.bashrc
. /home/ec2-user/miniconda3/etc/profile.d/conda.sh

# add bioconda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create a conda environment called 'sgc' with python 3.6 and baclm installed.
conda create -n sgc python==3.6 bcalm

Now, activate and try running the spacegraphcats example:

conda activate sgc

# install prereqs for spacegraphcats
pip install Cython
pip install https://github.com/dib-lab/pybbhash/archive/spacegraphcats.zip
pip install https://github.com/dib-lab/khmer/archive/master.zip
pip install git+https://github.com/dib-lab/sourmash@master#egg=sourmash

# install spacegraphcats
pip install spacegraphcats

# grab example repo
git clone https://github.com/spacegraphcats/spacegraphcats-twofoo-example.git
cd spacegraphcats-twofoo-example/

# run example workflow
snakemake

this fails for me after a few minutes of running with:

/home/ec2-user/miniconda3/envs/sgc/bin/python -m spacegraphcats.cdbg.bcalm_to_gxt  -k 31 twofoo/bcalm.twofoo.k31.unitigs.fa twofoo_k31_r1/cdbg.gxt twofoo_k31_r1/contigs.fa.gz
reading unitigs from twofoo/bcalm.twofoo.k31.unitigs.fa
...read 206855 unitigs
577 -> 181159, but not 181159 -> 577
576 -> 81582, but not 81582 -> 576
4675 -> 101520, but not 101520 -> 4675
4675 -> 104265, but not 104265 -> 4675
...
204785 -> 12508, but not 12508 -> 204785
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 375, in <module>
    main(sys.argv[1:])
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 257, in main
    neighbors, sequences, mean_abunds, sizes = read_bcalm(unitigs, debug, k)
  File "/home/ec2-user/miniconda3/envs/sgc/lib/python3.6/site-packages/spacegraphcats/cdbg/bcalm_to_gxt.py", line 214, in read_bcalm
    assert not fail
AssertionError
[Sun Dec 23 20:23:27 2018]
Error in rule bcalm_catlas_input:
...

Here's what I have installed for bcalm:

% conda list | grep bcalm
bcalm                     2.2.0                hd28b015_3    bioconda

cc @natir viz #32.

ctb commented 5 years ago

Oddly enough my shorter automated workflow https://github.com/ctb/2018-bcalm-debug doesn't detect any problems. Must not trigger the right problem.

Note that when I compile bcalm from scratch on my HPC the spacegraphcats-twofoo-example above runs just fine and passes the reciprocal edges test.

rchikhi commented 5 years ago

Thanks for the detailed report! I confirm that I can reproduce the behavior of https://github.com/GATB/bcalm/issues/37#issue-393778335 on my own Linux, when using bioconda's bcalm. The problem doesn't occur when using the current master branch of bcalm. So something must be going on with bioconda's version, I'm on it.

ctb commented 5 years ago

Whew! Finally :). I hate Heisenbugs!

-- Titus Brown, ctbrown@ucdavis.edumailto:ctbrown@ucdavis.edu

On Dec 23, 2018, at 3:09 PM, Rayan Chikhi notifications@github.com<mailto:notifications@github.com> wrote:

Thanks for the detailed report! I confirm that I can reproduce the behavior of #37 (comment)https://github.com/GATB/bcalm/issues/37#issue-393778335 on my own Linux, when using bioconda's bcalm. The problem doesn't occur when using the current master branch of bcalm. So something must be going on with bioconda's version, I'm on it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/GATB/bcalm/issues/37#issuecomment-449668832, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AADHSEvd6o4r_Mv6i9F83vQlxh3SPe4hks5u8A0_gaJpZM4Zf-k_.

rchikhi commented 5 years ago

I think I see what's going on. Two bugs: 1) I thought BCALM outputted consecutive unitig IDs, but apparently with some compilers it had some rare inversions (e.g. 9,10,12,11,13,14,..). Likely a multi-threading bug. 2) The procedure used to link unitigs silently and wrongly assumed that unitigs needed to have consecutive IDs. I've commited a fix for both bugs and will release BCALM 2.2.1. @natir could you please update Conda when you have time, or show me how?

ctb commented 5 years ago

you rock! thanks, @rchikhi!

ctb commented 5 years ago

ping! I'd suggest withdrawing bcalm from bioconda until someone has time to fix it.

rchikhi commented 5 years ago

pinging @natir: bcalm 2.2.1 is released and needs to be updated in bioconda please :)

natir commented 5 years ago

@epruesse update bioconda bcalm to 2.2.1 in this pull request bioconda/bioconda-recipes#12811

rchikhi commented 5 years ago

merci!

ctb commented 5 years ago

wonderful!