Closed SchwarzEM closed 6 years ago
Details of how I set up HINGE and tried running its demo assembly:
cd /home/schwarz/src ;
rm -rf HINGE_20feb2018 ;
git clone https://github.com/fxia22/HINGE.git HINGE_21feb2018 ;
cd HINGE_21feb2018 ;
git submodule init ;
git submodule update ;
# fix some very bad misspecifications in here so that the thing has a chance to work:
cp -ip ./utils/build.sh ./utils/orig_build.sh_file ;
cat ./utils/build.sh ;
[contents of original ./utils/build.sh:]
#!/bin/bash
pwd=$PWD
cd $pwd/thirdparty/DAZZ_DB
make -j 8
cd $pwd/thirdparty//DALIGNER
make -j 8
cd $pwd/thirdparty/DASCRUBBER
make -j 8
cd $pwd/thirdparty/DEXTRACTOR
make -j 8
cd $pwd
mkdir build
cd $pwd/build
cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.8 -DCMAKE_CXX_COMPILER=g++-4.8
make -j 8
make install
exit $?
pico ./utils/build.sh ;
diff ./utils/orig_build.sh_file ./utils/build.sh ;
[output of "diff ./utils/orig_build.sh_file ./utils/build.sh":]
13,14c13,14
< cd $pwd/thirdparty/DEXTRACTOR
< make -j 8
---
> # cd $pwd/thirdparty/DEXTRACTOR
> # make -j 8
19c19
< cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.8 -DCMAKE_CXX_COMPILER=g++-4.8
---
> cmake .. -DCMAKE_INSTALL_PREFIX=../inst -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9
CPATH="/usr/include/hdf5/serial" ; # $CPATH is empty on my system
export CPATH ;
[check to see that I have everything installed that I need:]
python -m pip install --user pbcore ;
[can't run this: python3 -m pip install --user pbcore ;]
[for all of the others, enforce local pip to both python2 and python3]
python -m pip install --user ujson ;
python3 -m pip install --user ujson ;
python -m pip install --user colormap ;
python3 -m pip install --user colormap ;
python -m pip install --user easydev ;
python3 -m pip install --user easydev ;
python -m pip install --user configparser ;
python3 -m pip install --user configparser ;
[finally, this should work, so run it:]
./utils/build.sh ;
[before I try working with this, make sure this script has a chance of working:]
cd /home/schwarz/src/HINGE_21feb2018/utils ;
cp -ip setup.sh orig_setup.sh_file ;
pico setup.sh ;
[remove DEXTRACTOR from the $PATH, since I didn't compile it; remove racon, since it's not really there any more]
[also, get rid of the $PWD nonsense, which just makes the damn thing fail when invoked from anywhere else]
cat setup.sh ;
[contents of revised setup.sh:]
PPWD=/home/schwarz/src/HINGE_21feb2018
export PATH="$PATH:$PPWD/thirdparty:$PPWD/thirdparty/DALIGNER:$PPWD/thirdparty/DAZZ_DB:$PPWD/thirdparty/DASCRUBBER:$PPWD/inst/bin"
export MANPATH="$MANPATH:$PPWD/inst/share/man"
cd /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo ;
cp -ip run.sh orig_run.sh_file ;
[Make wget run silently, and put "source /home/schwarz/src/HINGE_21feb2018/utils/setup.sh ;" in the run.sh script:]
pico run.sh ;
chmod +x run.sh ;
cat run.sh ;
[contents of run.sh, edited to have slightly tighter line-command syntax and to emit tracking text files, so that I can stop having to guess when the thing failed:]
#!/bin/bash
# put this *inside* the script so that there is no chance for it not to be invoked:
source /home/schwarz/src/HINGE_21feb2018/utils/setup.sh ;
# use '-q' mode to avoid cluttering up the nohup files:
wget -q http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p4_filtered.fastq.gz ;
gunzip ecoli_p4_filtered.fastq.gz ;
seqtk seq -a ecoli_p4_filtered.fastq > reads.fasta ;
hinge correct-head reads.fasta reads.pb.fasta map.txt ;
echo "hinge correct-head reads.fasta reads.pb.fasta map.txt ;" > step01.txt ;
fasta2DB ecoli reads.pb.fasta ;
echo "fasta2DB ecoli reads.pb.fasta ;" > step02.txt ;
DBsplit ecoli ;
echo "DBsplit ecoli ;" > step03.txt ;
HPC.daligner ecoli | bash -v ;
echo "HPC.daligner ecoli | bash -v ;" > step04.txt ;
rm ecoli.*.ecoli.*.las ;
echo "rm ecoli.*.ecoli.*.las ;" > step05.txt ;
LAmerge ecoli.las ecoli.[0-9].las ;
echo "LAmerge ecoli.las ecoli.[0-9].las ;" > step06.txt ;
DASqv -c100 ecoli ecoli.las ;
echo "DASqv -c100 ecoli ecoli.las ;" > step07.txt ;
mkdir log ;
echo "mkdir log ;" > step08.txt ;
hinge filter --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;
echo "hinge filter --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;" > step09.txt ;
hinge maximal --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;
echo "hinge maximal --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini ;" > step10.txt ;
hinge layout --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini -o ecoli ;
echo "hinge layout --db ecoli --las ecoli --mlas -x ecoli --config ../../utils/nominal.ini -o ecoli ;" > step11.txt ;
hinge clip ecoli.edges.hinges ecoli.hinge.list demo ;
echo "hinge clip ecoli.edges.hinges ecoli.hinge.list demo ;" > step12.txt ;
hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;
echo "hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;" > step13.txt ;
hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;
echo "hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;" > step14.txt ;
hinge correct-head ecoli.draft.fasta ecoli.draft.pb.fasta draft_map.txt ;
echo "hinge correct-head ecoli.draft.fasta ecoli.draft.pb.fasta draft_map.txt ;" > step15.txt ;
fasta2DB draft ecoli.draft.pb.fasta ;
echo "fasta2DB draft ecoli.draft.pb.fasta ;" > step16.txt ;
HPC.daligner ecoli draft | bash -v ;
echo "HPC.daligner ecoli draft | bash -v ;" > step17.txt ;
hinge consensus draft ecoli draft.ecoli.las ecoli.consensus.fasta ../../utils/nominal.ini ;
echo "hinge consensus draft ecoli draft.ecoli.las ecoli.consensus.fasta ../../utils/nominal.ini ;" > step18.txt ;
hinge gfa $PWD ecoli ecoli.consensus.fasta ;
echo "hinge gfa $PWD ecoli ecoli.consensus.fasta ;" > step19.txt ;
[try the test data run:]
cd /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo ;
nohup ./run.sh 1>nohup1.out 2>nohup1.err &
However, when I finally ran run.sh as shown above, it did not succesfully go to completion. Instead, it failed on or before step 13 ("hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;").
Here are the error messages that I got from the run (recorded in nohup1.err):
[...]
# Remove level 1 .las files (optional)
rm L1.1.1.las L1.1.2.las L1.1.3.las
rm L1.2.1.las L1.2.2.las L1.2.3.las
rm L1.3.1.las L1.3.2.las L1.3.3.las
rm: cannot remove 'ecoli.*.ecoli.*.las': No such file or directory
[things really visibly go bad at this point:]
Traceback (most recent call last):
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/pruning_and_clipping.py", line 1428, in <module>
mark_skipped_edges(G,flname.split('.')[0] + '.edges.skipped')
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/pruning_and_clipping.py", line 1033, in mark_skipped_edges
G.edge[lines1[0] + "_" + lines1[3]][lines1[1] + "_" + lines1[4]]['skipped'] = 1
AttributeError: 'DiGraph' object has no attribute 'edge'
/woldlab/rattus/lvol0/mus/home/schwarz/.local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of
issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/home/schwarz/src/HINGE_21feb2018/inst/bin/../lib/hinge/get_draft_path.py", line 59, in <module>
in_graph = nx.read_graphml(graphml_path)
File "<decorator-gen-442>", line 2, in read_graphml
File "/woldlab/rattus/lvol0/mus/home/schwarz/.local/lib/python2.7/site-packages/networkx/utils/decorators.py", line 205, in _open_file
fobj = _dispatch_dict[ext](path, mode=mode)
IOError: [Errno 2] No such file or directory: 'ecolidemo.G2.graphml'
As might be guessed from the error messages, the file "ecolidemo.G2.graphml" was never actually produced.
Also, this step ("step 14") never completed:
hinge draft --db ecoli --las ecoli --mlas --prefix ecoli --config ../../utils/nominal.ini --out ecoli.draft ;
Instead, I got a tremendous quantity of blank lines being printed to STDOUT (and, therefore, ending up in nohup1.out). This is a phenomenon that other users of HINGE have described in previous Github issue posts.
All of this left me with the following files in the directory /home/schwarz/src/HINGE_21feb2018/demo/ecoli_demo:
-rw-r--r-- 1 schwarz schwarz 501778089 Feb 21 15:48 nohup1.out
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:40 ecoli.draft.fasta
drwxr-xr-x 2 schwarz schwarz 150 Feb 21 15:40 log
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:40 ecoli.draft.deadends.txt
-rw-r--r-- 1 schwarz schwarz 96 Feb 21 15:40 step13.txt
-rw-r--r-- 1 schwarz schwarz 9442 Feb 21 15:40 nohup1.err
-rw-r--r-- 1 schwarz schwarz 54 Feb 21 15:40 step12.txt
-rw-r--r-- 1 schwarz schwarz 96 Feb 21 15:40 step11.txt
-rw-r--r-- 1 schwarz schwarz 699 Feb 21 15:40 ecoli.debug
-rw-r--r-- 1 schwarz schwarz 11386 Feb 21 15:40 ecoli.hgraph
-rw-r--r-- 1 schwarz schwarz 579 Feb 21 15:40 ecoli.edges.skipped
-rw-r--r-- 1 schwarz schwarz 413963 Feb 21 15:40 ecoli.edges.greedy
-rw-r--r-- 1 schwarz schwarz 324550 Feb 21 15:40 ecoli.edges.hinges2
-rw-r--r-- 1 schwarz schwarz 413952 Feb 21 15:40 ecoli.edges.hinges
-rw-r--r-- 1 schwarz schwarz 292190 Feb 21 15:40 ecoli.edges.2
-rw-r--r-- 1 schwarz schwarz 287776 Feb 21 15:40 ecoli.edges.1
-rw-r--r-- 1 schwarz schwarz 307998 Feb 21 15:40 edges.g_out.txt
-rw-r--r-- 1 schwarz schwarz 74638 Feb 21 15:40 ecoli.deadends.txt
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:40 ecoli.garbage.txt
-rw-r--r-- 1 schwarz schwarz 580859 Feb 21 15:40 ecoli.killed.hinges
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:40 hinge_debug.txt
-rw-r--r-- 1 schwarz schwarz 80 Feb 21 15:40 ecoli.hinge.list
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:40 overlap_debug.txt
-rw-r--r-- 1 schwarz schwarz 723876 Feb 21 15:40 edges.bkw.backup.txt
-rw-r--r-- 1 schwarz schwarz 738155 Feb 21 15:40 edges.fwd.backup.txt
-rw-r--r-- 1 schwarz schwarz 88 Feb 21 15:39 step10.txt
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:39 ecoli.contained.txt
-rw-r--r-- 1 schwarz schwarz 19644 Feb 21 15:39 ecoli.max
-rw-r--r-- 1 schwarz schwarz 82206240 Feb 21 15:39 ecoli.coverage.txt
-rw-r--r-- 1 schwarz schwarz 87 Feb 21 15:37 step09.txt
-rw-r--r-- 1 schwarz schwarz 1197258 Feb 21 15:37 ecoli.mas
-rw-r--r-- 1 schwarz schwarz 955400 Feb 21 15:37 ecoli.cmas
-rw-r--r-- 1 schwarz schwarz 583149 Feb 21 15:37 ecoli.hinges.txt
-rw-r--r-- 1 schwarz schwarz 278019 Feb 21 15:36 ecoli.repeat.txt
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:36 debug.txt
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:35 ecoli.self.flag
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:35 ecoli.cov.flag
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:35 ecoli.filtered.fasta
-rw-r--r-- 1 schwarz schwarz 0 Feb 21 15:35 ecoli.homologous.txt
-rw-r--r-- 1 schwarz schwarz 12 Feb 21 15:35 step08.txt
-rw-r--r-- 1 schwarz schwarz 30 Feb 21 15:35 step07.txt
-rw-r--r-- 1 schwarz schwarz 36 Feb 21 15:35 step06.txt
-rw-r--r-- 1 schwarz schwarz 997205244 Feb 21 15:35 ecoli.las
-rw-r--r-- 1 schwarz schwarz 25 Feb 21 15:35 step05.txt
-rw-r--r-- 1 schwarz schwarz 31 Feb 21 15:35 step04.txt
-rw-r--r-- 1 schwarz schwarz 84334534 Feb 21 15:35 ecoli.3.las
-rw-r--r-- 1 schwarz schwarz 449776616 Feb 21 15:35 ecoli.2.las
-rw-r--r-- 1 schwarz schwarz 463094118 Feb 21 15:35 ecoli.1.las
-rw-r--r-- 1 schwarz schwarz 16 Feb 21 15:10 step03.txt
-rw-r--r-- 1 schwarz schwarz 195 Feb 21 15:10 ecoli.db
-rw-r--r-- 1 schwarz schwarz 32 Feb 21 15:10 step02.txt
-rw-r--r-- 1 schwarz schwarz 56 Feb 21 15:10 step01.txt
-rw-r--r-- 1 schwarz schwarz 446206093 Feb 21 15:10 reads.pb.fasta
-rw-r--r-- 1 schwarz schwarz 8313251 Feb 21 15:10 map.txt
-rw-r--r-- 1 schwarz schwarz 443660687 Feb 21 15:10 reads.fasta
-rwxr-xr-x 1 schwarz schwarz 2832 Feb 21 14:58 run.sh
-rw-r--r-- 1 schwarz schwarz 1212 Feb 21 14:25 run_norevcomp.sh
-rw-r--r-- 1 schwarz schwarz 1215 Feb 21 14:25 orig_run.sh_file
-rw-r--r-- 1 schwarz schwarz 880899439 Dec 10 2014 ecoli_p4_filtered.fastq
It is perhaps significant that several files generated by about step 8 (ecoli.homologous.txt, ecoli.filtered.fasta, ecoli.cov.flag, ecoli.self.flag, and debug.txt) are all zero-byte; that seems like a possible sign of an earlier failure in the demo script.
So, that's the story. I believe that I'm probably quite close to getting HINGE to run properly, which makes this almost success more frustrating. I have tried to follow the documentation closely, and have no idea what I'm doing wrong. Any help in getting this debugged would be very welcome!
Hi Erich,
Thanks for pointing this out. It turned that in the requirements_frozen.txt
file, we were including networkx 2.0
. This version of networkx (a python package we use in the hinge clip
step) was recently released, and it is not backwards compatible. So HINGE (which was designed for networkx 1.9) doesn't currently work with networkx 2.0. We plan on upgrading HINGE to be compatible with networkx 2.0, but it is not straightforward, so it may take some time until we do that. For now, the best thing is to downgrade it to networkx 1.9, and things should work for you.
Hi Ilan,
Your debugging worked!
Specifically, after de-installing networkx and then reinstalling networkx-1.9 explicitly:
python -m pip uninstall networkx ;
python3 -m pip uninstall networkx ;
python -m pip install --user networkx==1.9 ;
python3 -m pip install --user networkx==1.9 ;
I was then able to reinstall HINGE, compile it, and run it on its E. coli test data set (exactly as shown before).
This time, the test run went to successful completion. It gave me the product file "ecoli.consensus.fasta", which consisted of two contigs (Consensus0, 4,650,727 nt, 1.14% softmasked residues; and Consensus1, 4,649,782 nt, 0.85% softmasked residues).
Thanks for fixing this!
I have tried to compile HINGE and run it on one of its own test instances. The compilation superficially appears to work, but the test instance fails at (or before) the following step:
hinge draft-path $PWD ecoli ecolidemo.G2.graphml ;
To keep from bloating the text of this initial post greatly, I will provide full details of the failure as comments, rather than in the initial post itself.