HingeAssembler / HINGE

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd
Other
64 stars 9 forks source link

Test assembly of E. coli fails at or before "hinge draft-path ..." step #147

Closed SchwarzEM closed 6 years ago

SchwarzEM commented 6 years ago

I have tried to compile HINGE and run it on one of its own test instances. The compilation superficially appears to work, but the test instance fails at (or before) the following step:

hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;

To keep from bloating the text of this initial post greatly, I will provide full details of the failure as comments, rather than in the initial post itself.

SchwarzEM commented 6 years ago

Details of how I set up HINGE and tried running its demo assembly:

cd /home/schwarz/src ;
rm -rf HINGE_20feb2018 ;

git clone https://github.com/fxia22/HINGE.git HINGE_21feb2018 ;

cd HINGE_21feb2018 ;

git submodule init ;
git submodule update ;

# fix some very bad misspecifications in here so that the thing has a chance to work:
cp -ip ./utils/build.sh ./utils/orig_build.sh_file ;

cat ./utils/build.sh ;

[contents of original ./utils/build.sh:]
#!/bin/bash
pwd=$PWD

cd $pwd/thirdparty/DAZZ_DB
make -j 8

cd $pwd/thirdparty//DALIGNER
make -j 8

cd $pwd/thirdparty/DASCRUBBER
make -j 8

cd $pwd/thirdparty/DEXTRACTOR
make -j 8

cd $pwd
mkdir build
cd $pwd/build
cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.8 -DCMAKE_CXX_COMPILER=g++-4.8
make -j 8
make install

exit $?

pico ./utils/build.sh ;

diff ./utils/orig_build.sh_file ./utils/build.sh ;

[output of "diff ./utils/orig_build.sh_file ./utils/build.sh":]
13,14c13,14
< cd $pwd/thirdparty/DEXTRACTOR
< make -j 8
---
> # cd $pwd/thirdparty/DEXTRACTOR
> # make -j 8
19c19
< cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.8 -DCMAKE_CXX_COMPILER=g++-4.8
---
> cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9

CPATH="/usr/include/hdf5/serial" ;   # $CPATH is empty on my system
export CPATH ;

[check to see that I have everything installed that I need:]
python  -m pip install --user pbcore ;
[can't run this: python3 -m pip install --user pbcore ;]

[for all of the others, enforce local pip to both python2 and python3]
python  -m pip install --user ujson ;
python3 -m pip install --user ujson ;

python  -m pip install --user colormap ; 
python3 -m pip install --user colormap ;

python  -m pip install --user easydev ;
python3 -m pip install --user easydev ;

python  -m pip install --user configparser ;
python3 -m pip install --user configparser ;

[finally, this should work, so run it:]
./utils/build.sh ;

[before I try working with this, make sure this script has a chance of working:]
cd /home/schwarz/src/HINGE_21feb2018/utils ;
cp -ip setup.sh orig_setup.sh_file ;

pico setup.sh ;
[remove DEXTRACTOR from the $PATH, since I didn't compile it; remove racon, since it's not really there any more]
[also, get rid of the $PWD nonsense, which just makes the damn thing fail when invoked from anywhere else]

cat setup.sh ;

[contents of revised setup.sh:]
PPWD=/home/schwarz/src/HINGE_21feb2018
export PATH="$PATH:$PPWD/thirdparty:$PPWD/thirdparty/DALIGNER:$PPWD/thirdparty/DAZZ_DB:$PPWD/thirdparty/DASCRUBBER:$PPWD/inst/bin"
export MANPATH="$MANPATH:$PPWD/inst/share/man"

cd /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo ;
cp -ip run.sh orig_run.sh_file ;

[Make wget run silently, and put "source /home/schwarz/src/HINGE_21feb2018/utils/setup.sh ;" in the run.sh script:]

pico run.sh ;
chmod +x run.sh ;

cat run.sh ;

[contents of run.sh, edited to have slightly tighter line-command syntax and to emit tracking text files, so that I can stop having to guess when the thing failed:]
#!/bin/bash

# put this *inside* the script so that there is no chance for it not to be invoked:
source /home/schwarz/src/HINGE_21feb2018/utils/setup.sh ;

# use '-q' mode to avoid cluttering up the nohup files:
wget -q http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p4_filtered.fastq.gz ;
gunzip ecoli_p4_filtered.fastq.gz ;
seqtk seq -a ecoli_p4_filtered.fastq > reads.fasta ;

hinge correct-head reads.fasta reads.pb.fasta map.txt ;
echo "hinge correct-head reads.fasta reads.pb.fasta map.txt ;" > step01.txt ;

fasta2DB ecoli reads.pb.fasta ;
echo "fasta2DB ecoli reads.pb.fasta ;" > step02.txt ;

DBsplit ecoli ;
echo "DBsplit ecoli ;" > step03.txt ;

HPC.daligner ecoli | bash -v ;
echo "HPC.daligner ecoli | bash -v ;" > step04.txt ;

rm ecoli.*.ecoli.*.las ;
echo "rm ecoli.*.ecoli.*.las ;" > step05.txt ;

LAmerge ecoli.las ecoli.[0-9].las ;
echo "LAmerge ecoli.las ecoli.[0-9].las ;" > step06.txt ;

DASqv -c100 ecoli ecoli.las ;
echo "DASqv -c100 ecoli ecoli.las ;" > step07.txt ;

mkdir log ;
echo "mkdir log ;" > step08.txt ;

hinge filter --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;
echo "hinge filter --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;" > step09.txt ;

hinge maximal --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;
echo "hinge maximal --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;" > step10.txt ;

hinge layout --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini -o ecoli ;
echo "hinge layout --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini -o ecoli ;" > step11.txt ;

hinge clip ecoli.edges.hinges ecoli.hinge.list demo ;
echo "hinge clip ecoli.edges.hinges ecoli.hinge.list demo ;" > step12.txt ;

hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;
echo "hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;" > step13.txt ;

hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;
echo "hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;" > step14.txt ;

hinge correct-head ecoli.draft.fasta ecoli.draft.pb.fasta draft_map.txt ;
echo "hinge correct-head ecoli.draft.fasta ecoli.draft.pb.fasta draft_map.txt ;" > step15.txt ;

fasta2DB draft ecoli.draft.pb.fasta ;
echo "fasta2DB draft ecoli.draft.pb.fasta ;" > step16.txt ;

HPC.daligner ecoli draft | bash -v ;
echo "HPC.daligner ecoli draft | bash -v ;" > step17.txt ;

hinge consensus draft ecoli draft.ecoli.las ecoli.consensus.fasta ../../utils/nominal.ini ;
echo "hinge consensus draft ecoli draft.ecoli.las ecoli.consensus.fasta ../../utils/nominal.ini ;" > step18.txt ;

hinge gfa $PWD ecoli ecoli.consensus.fasta ;
echo "hinge gfa $PWD ecoli ecoli.consensus.fasta ;" > step19.txt ;

[try the test data run:]
cd /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo ;

nohup ./run.sh 1>nohup1.out 2>nohup1.err &
SchwarzEM commented 6 years ago

However, when I finally ran run.sh as shown above, it did not succesfully go to completion. Instead, it failed on or before step 13 ("hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;").

Here are the error messages that I got from the run (recorded in nohup1.err):

[...]
# Remove level 1 .las files (optional)
rm L1.1.1.las L1.1.2.las L1.1.3.las
rm L1.2.1.las L1.2.2.las L1.2.3.las
rm L1.3.1.las L1.3.2.las L1.3.3.las
rm: cannot remove 'ecoli.*.ecoli.*.las': No such file or directory

[things really visibly go bad at this point:]
Traceback (most recent call last):
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/pruning_and_clipping.py", line 1428, in <module>
    mark_skipped_edges(G,flname.split('.')[0] + '.edges.skipped')
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/pruning_and_clipping.py", line 1033, in mark_skipped_edges
    G.edge[lines1[0] + "_" + lines1[3]][lines1[1] + "_" + lines1[4]]['skipped'] = 1
AttributeError: 'DiGraph' object has no attribute 'edge'
/woldlab/rattus/lvol0/mus/home/schwarz/.local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of 
issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/get_draft_path.py", line 59, in <module>
    in_graph = nx.read_graphml(graphml_path)
File "<decorator-gen-442>", line 2, in read_graphml
File "/woldlab/rattus/lvol0/mus/home/schwarz/.local/lib/python2.7/site-packages/networkx/utils/decorators.py", line 205, in _open_file
    fobj = _dispatch_dict[ext](path, mode=mode)
IOError: [Errno 2] No such file or directory: 'ecolidemo.G2.graphml'

As might be guessed from the error messages, the file "ecolidemo.G2.graphml" was never actually produced.

Also, this step ("step 14") never completed:

hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;

Instead, I got a tremendous quantity of blank lines being printed to STDOUT (and, therefore, ending up in nohup1.out). This is a phenomenon that other users of HINGE have described in previous Github issue posts.

SchwarzEM commented 6 years ago

All of this left me with the following files in the directory /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo:

-rw-r--r-- 1 schwarz schwarz 501778089 Feb 21 15:48 nohup1.out
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:40 ecoli.draft.fasta
drwxr-xr-x 2 schwarz schwarz       150 Feb 21 15:40 log
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:40 ecoli.draft.deadends.txt
-rw-r--r-- 1 schwarz schwarz        96 Feb 21 15:40 step13.txt
-rw-r--r-- 1 schwarz schwarz      9442 Feb 21 15:40 nohup1.err
-rw-r--r-- 1 schwarz schwarz        54 Feb 21 15:40 step12.txt
-rw-r--r-- 1 schwarz schwarz        96 Feb 21 15:40 step11.txt
-rw-r--r-- 1 schwarz schwarz       699 Feb 21 15:40 ecoli.debug
-rw-r--r-- 1 schwarz schwarz     11386 Feb 21 15:40 ecoli.hgraph
-rw-r--r-- 1 schwarz schwarz       579 Feb 21 15:40 ecoli.edges.skipped
-rw-r--r-- 1 schwarz schwarz    413963 Feb 21 15:40 ecoli.edges.greedy
-rw-r--r-- 1 schwarz schwarz    324550 Feb 21 15:40 ecoli.edges.hinges2
-rw-r--r-- 1 schwarz schwarz    413952 Feb 21 15:40 ecoli.edges.hinges
-rw-r--r-- 1 schwarz schwarz    292190 Feb 21 15:40 ecoli.edges.2
-rw-r--r-- 1 schwarz schwarz    287776 Feb 21 15:40 ecoli.edges.1
-rw-r--r-- 1 schwarz schwarz    307998 Feb 21 15:40 edges.g_out.txt
-rw-r--r-- 1 schwarz schwarz     74638 Feb 21 15:40 ecoli.deadends.txt
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:40 ecoli.garbage.txt
-rw-r--r-- 1 schwarz schwarz    580859 Feb 21 15:40 ecoli.killed.hinges
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:40 hinge_debug.txt
-rw-r--r-- 1 schwarz schwarz        80 Feb 21 15:40 ecoli.hinge.list
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:40 overlap_debug.txt
-rw-r--r-- 1 schwarz schwarz    723876 Feb 21 15:40 edges.bkw.backup.txt
-rw-r--r-- 1 schwarz schwarz    738155 Feb 21 15:40 edges.fwd.backup.txt
-rw-r--r-- 1 schwarz schwarz        88 Feb 21 15:39 step10.txt
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:39 ecoli.contained.txt
-rw-r--r-- 1 schwarz schwarz     19644 Feb 21 15:39 ecoli.max
-rw-r--r-- 1 schwarz schwarz  82206240 Feb 21 15:39 ecoli.coverage.txt
-rw-r--r-- 1 schwarz schwarz        87 Feb 21 15:37 step09.txt
-rw-r--r-- 1 schwarz schwarz   1197258 Feb 21 15:37 ecoli.mas
-rw-r--r-- 1 schwarz schwarz    955400 Feb 21 15:37 ecoli.cmas
-rw-r--r-- 1 schwarz schwarz    583149 Feb 21 15:37 ecoli.hinges.txt
-rw-r--r-- 1 schwarz schwarz    278019 Feb 21 15:36 ecoli.repeat.txt
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:36 debug.txt
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:35 ecoli.self.flag
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:35 ecoli.cov.flag
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:35 ecoli.filtered.fasta
-rw-r--r-- 1 schwarz schwarz         0 Feb 21 15:35 ecoli.homologous.txt
-rw-r--r-- 1 schwarz schwarz        12 Feb 21 15:35 step08.txt
-rw-r--r-- 1 schwarz schwarz        30 Feb 21 15:35 step07.txt
-rw-r--r-- 1 schwarz schwarz        36 Feb 21 15:35 step06.txt
-rw-r--r-- 1 schwarz schwarz 997205244 Feb 21 15:35 ecoli.las
-rw-r--r-- 1 schwarz schwarz        25 Feb 21 15:35 step05.txt
-rw-r--r-- 1 schwarz schwarz        31 Feb 21 15:35 step04.txt
-rw-r--r-- 1 schwarz schwarz  84334534 Feb 21 15:35 ecoli.3.las
-rw-r--r-- 1 schwarz schwarz 449776616 Feb 21 15:35 ecoli.2.las
-rw-r--r-- 1 schwarz schwarz 463094118 Feb 21 15:35 ecoli.1.las
-rw-r--r-- 1 schwarz schwarz        16 Feb 21 15:10 step03.txt
-rw-r--r-- 1 schwarz schwarz       195 Feb 21 15:10 ecoli.db
-rw-r--r-- 1 schwarz schwarz        32 Feb 21 15:10 step02.txt
-rw-r--r-- 1 schwarz schwarz        56 Feb 21 15:10 step01.txt
-rw-r--r-- 1 schwarz schwarz 446206093 Feb 21 15:10 reads.pb.fasta
-rw-r--r-- 1 schwarz schwarz   8313251 Feb 21 15:10 map.txt
-rw-r--r-- 1 schwarz schwarz 443660687 Feb 21 15:10 reads.fasta
-rwxr-xr-x 1 schwarz schwarz      2832 Feb 21 14:58 run.sh
-rw-r--r-- 1 schwarz schwarz      1212 Feb 21 14:25 run_norevcomp.sh
-rw-r--r-- 1 schwarz schwarz      1215 Feb 21 14:25 orig_run.sh_file
-rw-r--r-- 1 schwarz schwarz 880899439 Dec 10  2014 ecoli_p4_filtered.fastq

It is perhaps significant that several files generated by about step 8 (ecoli.homologous.txt, ecoli.filtered.fasta, ecoli.cov.flag, ecoli.self.flag, and debug.txt) are all zero-byte; that seems like a possible sign of an earlier failure in the demo script.

SchwarzEM commented 6 years ago

So, that's the story. I believe that I'm probably quite close to getting HINGE to run properly, which makes this almost success more frustrating. I have tried to follow the documentation closely, and have no idea what I'm doing wrong. Any help in getting this debugged would be very welcome!

ilanshom commented 6 years ago

Hi Erich, Thanks for pointing this out. It turned that in the requirements_frozen.txt file, we were including networkx 2.0. This version of networkx (a python package we use in the hinge clip step) was recently released, and it is not backwards compatible. So HINGE (which was designed for networkx 1.9) doesn't currently work with networkx 2.0. We plan on upgrading HINGE to be compatible with networkx 2.0, but it is not straightforward, so it may take some time until we do that. For now, the best thing is to downgrade it to networkx 1.9, and things should work for you.

SchwarzEM commented 6 years ago

Hi Ilan,

Your debugging worked!

Specifically, after de-installing networkx and then reinstalling networkx-1.9 explicitly:

python  -m pip uninstall networkx ;    
python3 -m pip uninstall networkx ;

python  -m pip install --user networkx==1.9 ;
python3 -m pip install --user networkx==1.9 ;

I was then able to reinstall HINGE, compile it, and run it on its E. coli test data set (exactly as shown before).

This time, the test run went to successful completion. It gave me the product file "ecoli.consensus.fasta", which consisted of two contigs (Consensus0, 4,650,727 nt, 1.14% softmasked residues; and Consensus1, 4,649,782 nt, 0.85% softmasked residues).

Thanks for fixing this!