Closed Thomas-Schatz closed 7 years ago
So the problem was that my version of kaldi was too old and the script used for aligning (steps/align_fmllr_lats.sh
) did not exist. Specifiying in my abkhazia.conf
file the path to the kaldi install provided by Mathieu solved the problem.
The questions remain of how to allow external users to get the right version of Kaldi.
I also get another failure later in the tests: test/test_decode.py::test_decode_mono FAILED_
E RuntimeError: command "utils/queue.pl -q all.q@puck*.cm.cluster /home/thomas/tmpdir/test_decode_mono0/decode-mono/recipe/graph/mkgraph.log utils/mkgraph.sh --mono --transition-scale 1.0 --self-loop-scale 0.1 /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/am_mono0 /home/thomas/tmpdir/test_decode_mono0/decode-mono/recipe/graph" returned with 1
abkhazia/utils/jobs.py:73: RuntimeError
-------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------
computing full decoding graph
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
From the log in /home/thomas/tmpdir
, I get:
# Running on puck2
# Started at Mon Jul 3 20:06:59 CEST 2017
# utils/mkgraph.sh --mono --transition-scale 1.0 --self-loop-scale 0.1 /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/am_mono0 /home/thomas/tmpdir/test_decode_mono
0/decode-mono/recipe/graph
fstarcsort: error while loading shared libraries: libfstscript.so.1: cannot open shared object file: No such file or directory
fsttablecompose /home/thomas/tmpdir/lm_word0/L_disambig.fst /home/thomas/tmpdir/lm_word0/G.fst
fstminimizeencoded
fstpushspecial
fstdeterminizestar --use-log=true
# Accounting: time=1 threads=1
# Finished at Mon Jul 3 20:07:00 CEST 2017 with status 1
So it appears that the initial error is fstarcsort: error while loading shared libraries: libfstscript.so.1: cannot open shared object file: No such file or directory
.
I'm currently re-running the tests to see if I can reproduce it. Any idea what happened?
Ok thank you Thomas for reporting this. The issue is that one the binary fstarcsort doesn't find the library libfstscript.so. Maybe there's a mess in my personal Kaldi installation??
I suggest you to compile your own Kaldi from scratch by following https://abkhazia.readthedocs.io/en/latest/install.html#kaldi
I made a fork of Kaldi for compatibility with abkhazia here: https://github.com/bootphon/kaldi
M
I was not able to reproduce the issue when running the test again... Instead it failed when testing neural network training:
(abkhazia2017)[thomas@oberon abkhazia]$ pytest ./test --basetemp=/home/thomas/tmpdir -x -v
======================================================================== test session starts ========================================================================
platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /home/thomas/.conda/envs/abkhazia2017/bin/python
cachedir: .cache
rootdir: /fhgfs/bootphon/scratch/thomas/abkhazia2017/abkhazia, inifile:
collected 38 items
test/test_acoustic.py::test_acoustic_njobs[4] PASSED
test/test_acoustic.py::test_acoustic_njobs[11] PASSED
test/test_acoustic.py::test_monophone_cmvn_good PASSED
test/test_acoustic.py::test_monophone_cmvn_bad PASSED
test/test_align.py::test_align[both-False] PASSED
test/test_ark.py::test_read_write[text] PASSED
test/test_ark.py::test_read_write[binary] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a-b] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a_b] PASSED
test/test_ark.py::test_h5f_twice PASSED
test/test_corpus.py::test_save_corpus PASSED
test/test_corpus.py::test_empty PASSED
test/test_corpus.py::test_subcorpus PASSED
test/test_corpus.py::test_split PASSED
test/test_corpus.py::test_split_tiny_train PASSED
test/test_corpus.py::test_split_by_speakers PASSED
test/test_corpus.py::test_spk2utt PASSED
test/test_corpus.py::test_phonemize_text PASSED
test/test_decode.py::test_decode_mono PASSED
test/test_decode.py::test_decode_tri PASSED
test/test_decode.py::test_decode_trisa PASSED
test/test_decode.py::test_decode_nnet ERROR
============================================================================== ERRORS ===============================================================================
________________________________________________________________ ERROR at setup of test_decode_nnet _________________________________________________________________
corpus = <abkhazia.corpus.corpus.Corpus object at 0x2aab10531390>, features = '/home/thomas/tmpdir/features0', lm_word = '/home/thomas/tmpdir/lm_word0'
am_trisa = '/home/thomas/tmpdir/am_trisa0', tmpdir_factory = <_pytest.tmpdir.TempdirFactory instance at 0x2aaaed4417a0>
@pytest.fixture(scope='session')
def am_nnet(corpus, features, lm_word, am_trisa, tmpdir_factory):
output_dir = str(tmpdir_factory.mktemp('am_nnet'))
flog = os.path.join(output_dir, 'am_nnet.log')
log = utils.logger.get_log(flog)
am = acoustic.NeuralNetwork(
corpus, lm_word, features, am_trisa, output_dir, log=log)
am.options['num-epochs'].value = 2
am.options['num-epochs-extra'].value = 1
am.options['num-hidden-layers'].value = 1
am.options['num-iters-final'].value = 1
am.options['pnorm-input-dim'].value = 100
am.options['pnorm-output-dim'].value = 10
am.options['num-utts-subset'].value = 20
> am.compute()
test/conftest.py:168:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
abkhazia/abstract_recipe.py:185: in compute
self.run()
abkhazia/acoustic/neural_network.py:155: in run
self._train_pnorm_fast()
abkhazia/acoustic/neural_network.py:205: in _train_pnorm_fast
self._run_am_command(command, target, message)
abkhazia/acoustic/abstract_acoustic_model.py:140: in _run_am_command
self._run_command(command, verbose=False)
abkhazia/abstract_recipe.py:102: in _run_command
cwd=self.recipe_dir)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
command = 'steps/nnet2/train_pnorm_fast.sh --cmd "queue.pl -q all.q@puck*.cm.cluster --config /fhgfs/bootphon/scratch/thomas/abk.../data/acoustic /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet'
stdin = None, stdout = <bound method RootLogger.debug of <logging.RootLogger object at 0x2aaaaf4eead0>>, cwd = '/home/thomas/tmpdir/am_nnet0/recipe'
env = {'SSH_ASKPASS': '/usr/libexec/openssh/gnome-ssh-askpass', 'MODULE_VERSION': '3.2.6', 'kaldi_steps': '/home/thomas/tmpd...C_module()': '() { eval `/cm/local/apps/environment-modules/3.2.6//Modules/$MODULE_VERSION/bin/modulecmd bash $*`\n}'}
returncode = 0
def run(command, stdin=None, stdout=sys.stdout.write,
cwd=None, env=os.environ, returncode=0):
"""Run 'command' as a subprocess
command : string to be executed as a subprocess
stdout : standard output/error redirection function. By default
redirect the output to stdout, but you can redirect to a
logger with stdout=log.debug for exemple. Use
stdout=open(os.devnull, 'w').write to ignore the command
output.
stdin : standard input redirection, can be a file or any readable
stream.
cwd : current working directory for executing the command
env : current environment for executing the command
returncode : expected return code of the command
Returns silently if the command returned with `returncode`, else
raise a RuntimeError
"""
job = subprocess.Popen(
shlex.split(command),
stdin=stdin,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=cwd, env=env)
# join the command output to log (from
# https://stackoverflow.com/questions/35488927)
def consume_lines(pipe, consume):
with pipe:
# NOTE: workaround read-ahead bug
for line in iter(pipe.readline, b''):
consume(line)
consume('\n')
threading.Thread(
target=consume_lines,
args=[job.stdout, lambda line: stdout(line)]).start()
job.wait()
if job.returncode != returncode:
raise RuntimeError('command "{}" returned with {}'
> .format(command, job.returncode))
E RuntimeError: command "steps/nnet2/train_pnorm_fast.sh --cmd "queue.pl -q all.q@puck*.cm.cluster --config /fhgfs/bootphon/scratch/thomas/abkhazia2017/abkhazia/abkhazia/share/queue.conf" --num-hidden-layers 1 --presoftmax-prior-scale-power -0.25 --num-iters-final 1 --bias-stddev 0.5 --initial-learning-rate 0.04 --randprune 4.0 --target-multiplier 0 --minibatch-size 128 --num-epochs-extra 1 --shuffle-buffer-size 500 --final-learning-rate 0.004 --splice-width 4 --alpha 4.0 --pnorm-output-dim 10 --samples-per-iter 200000 --add-layers-period 2 --num-epochs 2 --p 2 --pnorm-input-dim 100 --mix-up 0 --io-opts "" --egs-opts "--num-utts-subset 20" --num-threads 20 --parallel-opts "--num-threads 20" --combine-num-threads 8 --combine-parallel-opts "--num-threads 8" /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet" returned with 1
abkhazia/utils/jobs.py:73: RuntimeError
----------------------------------------------------------------------- Captured stdout setup -----------------------------------------------------------------------
training neural network
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================== 22 passed, 1 error in 2201.34 seconds ===============================================================
The log shows that there were a lot of errors during training:
[thomas@oberon am_nnet0]$ more am_nnet.log
2017-07-03 20:49:46,006 - INFO - training neural network
2017-07-03 20:49:46,025 - DEBUG - steps/nnet2/train_pnorm_fast.sh --cmd queue.pl -q all.q@puck*.cm.cluster --config /fhgfs/bootphon/scratch/thom
as/abkhazia2017/abkhazia/abkhazia/share/queue.conf --num-hidden-layers 1 --presoftmax-prior-scale-power -0.25 --num-iters-final 1 --bias-stddev
0.5 --initial-learning-rate 0.04 --randprune 4.0 --target-multiplier 0 --minibatch-size 128 --num-epochs-extra 1 --shuffle-buffer-size 500 --fin
al-learning-rate 0.004 --splice-width 4 --alpha 4.0 --pnorm-output-dim 10 --samples-per-iter 200000 --add-layers-period 2 --num-epochs 2 --p 2 -
-pnorm-input-dim 100 --mix-up 0 --io-opts --egs-opts --num-utts-subset 20 --num-threads 20 --parallel-opts --num-threads 20 --combine-num-threa
ds 8 --combine-parallel-opts --num-threads 8 /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/
am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet
2017-07-03 20:49:47,105 - DEBUG - steps/nnet2/train_pnorm_fast.sh: calling get_lda.sh
2017-07-03 20:49:47,108 - DEBUG - steps/nnet2/get_lda.sh --transform-dir /home/thomas/tmpdir/am_trisa0 --splice-width 4 --cmd queue.pl -q all.q@
puck*.cm.cluster --config /fhgfs/bootphon/scratch/thomas/abkhazia2017/abkhazia/abkhazia/share/queue.conf /home/thomas/tmpdir/am_nnet0/recipe/dat
a/acoustic /home/thomas/tmpdir/lm_word0 /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet
2017-07-03 20:49:47,158 - DEBUG - steps/nnet2/get_lda.sh: feature type is raw
2017-07-03 20:49:47,168 - DEBUG - feat-to-dim 'ark,s,cs:utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split2
0/1/feats.scp | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/home/thomas/tmpdir/am_nnet0/r
ecipe/data/acoustic/split20/1/cmvn.scp scp:- ark:- |' -
2017-07-03 20:49:47,176 - DEBUG - apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/home/thomas/
tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:-
2017-07-03 20:49:47,177 - DEBUG - ERROR (apply-cmvn:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,178 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-holder-inl.h:51) Exception caught writing Table object: ERROR (apply-cm
vn:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,178 - DEBUG - [stack trace: ]
2017-07-03 20:49:47,178 - DEBUG - kaldi::KaldiGetStackTrace()
2017-07-03 20:49:47,179 - DEBUG - kaldi::KaldiErrorMessage::~KaldiErrorMessage()
2017-07-03 20:49:47,179 - DEBUG - kaldi::MatrixBase<float>::Write(std::ostream&, bool) const
2017-07-03 20:49:47,179 - DEBUG - kaldi::KaldiObjectHolder<kaldi::Matrix<float> >::Write(std::ostream&, bool, kaldi::Matrix<float> const&)
2017-07-03 20:49:47,179 - DEBUG - kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kal
di::Matrix<float> const&)
2017-07-03 20:49:47,180 - DEBUG - kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kaldi::Matrix<
float> const&) const
2017-07-03 20:49:47,180 - DEBUG - apply-cmvn(main+0x6fa) [0x465616]
2017-07-03 20:49:47,180 - DEBUG - /lib64/libc.so.6(__libc_start_main+0xfd) [0x30c2a1ed5d]
2017-07-03 20:49:47,180 - DEBUG - apply-cmvn() [0x464e39]
2017-07-03 20:49:47,180 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-table-inl.h:693) TableWriter: write failure to standard output
2017-07-03 20:49:47,181 - DEBUG - ERROR (apply-cmvn:Write():util/kaldi-table-inl.h:1142) Error in TableWriter::Write
2017-07-03 20:49:47,181 - DEBUG - WARNING (apply-cmvn:Close():util/kaldi-table-inl.h:724) TableWriter: error closing stream: standard output
2017-07-03 20:49:47,181 - DEBUG - ERROR (apply-cmvn:~TableWriter():util/kaldi-table-inl.h:1165) Error closing TableWriter [in destructor].
2017-07-03 20:49:47,181 - DEBUG - sh: line 1: 7113 Done utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet0/recipe/
data/acoustic/split20/1/feats.scp
2017-07-03 20:49:47,181 - DEBUG - 7114 Aborted | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/spli
t20/1/utt2spk scp:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:-
2017-07-03 20:49:47,182 - DEBUG - WARNING (feat-to-dim:Close():kaldi-io.cc:446) Pipe utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet
0/recipe/data/acoustic/split20/1/feats.scp | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/
home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:- | had nonzero return status 34304
2017-07-03 20:49:47,189 - DEBUG - feat-to-dim 'ark,s,cs:utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split2
0/1/feats.scp | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/home/thomas/tmpdir/am_nnet0/r
ecipe/data/acoustic/split20/1/cmvn.scp scp:- ark:- | splice-feats --left-context=4 --right-context=4 ark:- ark:- |' -
2017-07-03 20:49:47,196 - DEBUG - apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/home/thomas/
tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:-
2017-07-03 20:49:47,246 - DEBUG - splice-feats --left-context=4 --right-context=4 ark:- ark:-
2017-07-03 20:49:47,247 - DEBUG - ERROR (splice-feats:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,249 - DEBUG - WARNING (splice-feats:Write():util/kaldi-holder-inl.h:51) Exception caught writing Table object: ERROR (splice
-feats:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,249 - DEBUG - [stack trace: ]
2017-07-03 20:49:47,249 - DEBUG - kaldi::KaldiGetStackTrace()
2017-07-03 20:49:47,249 - DEBUG - kaldi::KaldiErrorMessage::~KaldiErrorMessage()
2017-07-03 20:49:47,249 - DEBUG - kaldi::MatrixBase<float>::Write(std::ostream&, bool) const
2017-07-03 20:49:47,250 - DEBUG - kaldi::KaldiObjectHolder<kaldi::Matrix<float> >::Write(std::ostream&, bool, kaldi::Matrix<float> const&)
2017-07-03 20:49:47,250 - DEBUG - kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kal
di::Matrix<float> const&)
2017-07-03 20:49:47,250 - DEBUG - kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kaldi::Matrix<
float> const&) const
2017-07-03 20:49:47,250 - DEBUG - splice-feats(main+0x292) [0x454f6e]
2017-07-03 20:49:47,251 - DEBUG - /lib64/libc.so.6(__libc_start_main+0xfd) [0x30c2a1ed5d]
2017-07-03 20:49:47,251 - DEBUG - splice-feats() [0x454bf9]
2017-07-03 20:49:47,251 - DEBUG - WARNING (splice-feats:Write():util/kaldi-table-inl.h:693) TableWriter: write failure to standard output
2017-07-03 20:49:47,251 - DEBUG - ERROR (splice-feats:Write():util/kaldi-table-inl.h:1142) Error in TableWriter::Write
2017-07-03 20:49:47,252 - DEBUG - WARNING (splice-feats:Close():util/kaldi-table-inl.h:724) TableWriter: error closing stream: standard output
2017-07-03 20:49:47,252 - DEBUG - ERROR (splice-feats:~TableWriter():util/kaldi-table-inl.h:1165) Error closing TableWriter [in destructor].
2017-07-03 20:49:47,252 - DEBUG - ERROR (apply-cmvn:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,252 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-holder-inl.h:51) Exception caught writing Table object: ERROR (apply-cm
vn:Write():kaldi-matrix.cc:1143) Failed to write matrix to stream
2017-07-03 20:49:47,252 - DEBUG - [stack trace: ]
2017-07-03 20:49:47,252 - DEBUG - kaldi::KaldiGetStackTrace()
2017-07-03 20:49:47,253 - DEBUG - kaldi::KaldiErrorMessage::~KaldiErrorMessage()
2017-07-03 20:49:47,253 - DEBUG - kaldi::MatrixBase<float>::Write(std::ostream&, bool) const
2017-07-03 20:49:47,253 - DEBUG - kaldi::KaldiObjectHolder<kaldi::Matrix<float> >::Write(std::ostream&, bool, kaldi::Matrix<float> const&)
2017-07-03 20:49:47,253 - DEBUG - kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kal
di::Matrix<float> const&)
2017-07-03 20:49:47,253 - DEBUG - kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Write(std::string const&, kaldi::Matrix<
float> const&) const
2017-07-03 20:49:47,254 - DEBUG - apply-cmvn(main+0x6fa) [0x465616]
2017-07-03 20:49:47,254 - DEBUG - /lib64/libc.so.6(__libc_start_main+0xfd) [0x30c2a1ed5d]
2017-07-03 20:49:47,254 - DEBUG - apply-cmvn() [0x464e39]
2017-07-03 20:49:47,254 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-table-inl.h:693) TableWriter: write failure to standard output
2017-07-03 20:49:47,254 - DEBUG - ERROR (apply-cmvn:Write():util/kaldi-table-inl.h:1142) Error in TableWriter::Write
2017-07-03 20:49:47,255 - DEBUG - WARNING (apply-cmvn:Close():util/kaldi-table-inl.h:724) TableWriter: error closing stream: standard output
2017-07-03 20:49:47,255 - DEBUG - ERROR (apply-cmvn:~TableWriter():util/kaldi-table-inl.h:1165) Error closing TableWriter [in destructor].
2017-07-03 20:49:47,255 - DEBUG - sh: line 1: 7120 Done utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet0/recipe/
data/acoustic/split20/1/feats.scp
2017-07-03 20:49:47,255 - DEBUG - 7121 Aborted | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/spli
t20/1/utt2spk scp:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:-
2017-07-03 20:49:47,255 - DEBUG - 7122 Aborted | splice-feats --left-context=4 --right-context=4 ark:- ark:-
2017-07-03 20:49:47,256 - DEBUG - WARNING (feat-to-dim:Close():kaldi-io.cc:446) Pipe utils/subset_scp.pl --quiet 500 /home/thomas/tmpdir/am_nnet
0/recipe/data/acoustic/split20/1/feats.scp | apply-cmvn --utt2spk=ark:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/utt2spk scp:/
home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/split20/1/cmvn.scp scp:- ark:- | splice-feats --left-context=4 --right-context=4 ark:- ark:- |
had nonzero return status 34304
2017-07-03 20:49:47,256 - DEBUG - steps/nnet2/get_lda.sh: Accumulating LDA statistics.
2017-07-03 20:50:00,806 - DEBUG - steps/nnet2/get_lda.sh: Finished estimating LDA
2017-07-03 20:50:00,811 - DEBUG - steps/nnet2/train_pnorm_fast.sh: calling get_egs.sh
2017-07-03 20:50:00,814 - DEBUG - steps/nnet2/get_egs.sh --num-utts-subset 20 --transform-dir /home/thomas/tmpdir/am_trisa0 --splice-width 4 --s
amples-per-iter 200000 --num-jobs-nnet 16 --stage 0 --cmd queue.pl -q all.q@puck*.cm.cluster --config /fhgfs/bootphon/scratch/thomas/abkhazia201
7/abkhazia/abkhazia/share/queue.conf --num-utts-subset 20 --io-opts /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic /home/thomas/tmpdir/lm_wo
rd0 /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet
2017-07-03 20:50:00,888 - DEBUG - steps/nnet2/get_egs.sh: feature type is raw
2017-07-03 20:50:00,889 - DEBUG - steps/nnet2/get_egs.sh: working out number of frames of training data
2017-07-03 20:50:00,905 - DEBUG - utils/data/get_utt2dur.sh: working out /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/utt2dur from /home/th
omas/tmpdir/am_nnet0/recipe/data/acoustic/segments
2017-07-03 20:50:00,915 - DEBUG - utils/data/get_utt2dur.sh: computed /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/utt2dur
2017-07-03 20:50:00,994 - DEBUG - feat-to-len scp:/home/thomas/tmpdir/am_nnet0/recipe/data/acoustic/feats.scp ark,t:-
2017-07-03 20:50:01,204 - DEBUG - WARNING (feat-to-len:Write():util/kaldi-holder-inl.h:122) Exception caught writing Table object: Write failure
in WriteBasicType.
2017-07-03 20:50:01,204 - DEBUG - Write failure in WriteBasicType.WARNING (feat-to-len:Write():util/kaldi-table-inl.h:693) TableWriter: write fa
ilure to standard output
2017-07-03 20:50:01,205 - DEBUG - ERROR (feat-to-len:Write():util/kaldi-table-inl.h:1142) Error in TableWriter::Write
2017-07-03 20:50:01,205 - DEBUG - WARNING (feat-to-len:Close():util/kaldi-table-inl.h:724) TableWriter: error closing stream: standard output
2017-07-03 20:50:01,205 - DEBUG - ERROR (feat-to-len:~TableWriter():util/kaldi-table-inl.h:1165) Error closing TableWriter [in destructor].
2017-07-03 20:50:01,215 - DEBUG - steps/nnet2/get_egs.sh: Every epoch, splitting the data up into 1 iterations,
2017-07-03 20:50:01,215 - DEBUG - steps/nnet2/get_egs.sh: giving samples-per-iteration of 59523 (you requested 200000).
2017-07-03 20:50:09,806 - DEBUG - Getting validation and training subset examples.
2017-07-03 20:50:09,808 - DEBUG - steps/nnet2/get_egs.sh: extracting validation and training-subset alignments.
2017-07-03 20:50:09,835 - DEBUG - copy-int-vector ark:- ark,t:-
2017-07-03 20:50:10,205 - DEBUG - LOG (copy-int-vector:main():copy-int-vector.cc:83) Copied 999 vectors of int32.
2017-07-03 20:50:16,316 - DEBUG - Getting subsets of validation examples for diagnostics and combination.
2017-07-03 20:50:21,069 - DEBUG - Creating training examples
2017-07-03 20:50:21,071 - DEBUG - Generating training examples on disk
2017-07-03 20:50:32,249 - DEBUG - steps/nnet2/get_egs.sh: rearranging examples into parts for different parallel jobs
2017-07-03 20:50:32,249 - DEBUG - steps/nnet2/get_egs.sh: Since iters-per-epoch == 1, just concatenating the data.
2017-07-03 20:50:33,374 - DEBUG - Shuffling the order of training examples
2017-07-03 20:50:33,375 - DEBUG - (in order to avoid stressing the disk, these won't all run at once).
2017-07-03 20:50:41,666 - DEBUG - steps/nnet2/get_egs.sh: Finished preparing training examples
2017-07-03 20:50:41,671 - DEBUG - steps/nnet2/train_pnorm_fast.sh: initializing neural net
2017-07-03 20:50:41,709 - DEBUG - Usage: queue.pl [options] [JOB=1:n] log-file command-line arguments...
2017-07-03 20:50:41,709 - DEBUG - e.g.: queue.pl foo.log echo baz
2017-07-03 20:50:41,710 - DEBUG - (which will echo "baz", with stdout and stderr directed to foo.log)
2017-07-03 20:50:41,710 - DEBUG - or: queue.pl -q all.q@xyz foo.log echo bar | sed s/bar/baz/
2017-07-03 20:50:41,710 - DEBUG - (which is an example of using a pipe; you can provide other escaped bash constructs)
2017-07-03 20:50:41,710 - DEBUG - or: queue.pl -q all.q@qyz JOB=1:10 foo.JOB.log echo JOB
2017-07-03 20:50:41,710 - DEBUG - (which illustrates the mechanism to submit parallel jobs; note, you can use
2017-07-03 20:50:41,711 - DEBUG - another string other than JOB)
2017-07-03 20:50:41,711 - DEBUG - Note: if you pass the "-sync y" option to qsub, this script will take note
2017-07-03 20:50:41,711 - DEBUG - and change its behavior. Otherwise it uses qstat to work out when the job finished
2017-07-03 20:50:41,711 - DEBUG - Options:
2017-07-03 20:50:41,711 - DEBUG - --config <config-file> (default: conf/queue.conf)
2017-07-03 20:50:41,712 - DEBUG - --mem <mem-requirement> (e.g. --mem 2G, --mem 500M,
2017-07-03 20:50:41,712 - DEBUG - also support K and numbers mean bytes)
2017-07-03 20:50:41,712 - DEBUG - --num-threads <num-threads> (default: 1)
2017-07-03 20:50:41,712 - DEBUG - --max-jobs-run <num-jobs>
2017-07-03 20:50:41,712 - DEBUG - --gpu <0|1> (default: 0)
2017-07-03 20:50:41,832 - DEBUG - nnet-am-init /home/thomas/tmpdir/am_trisa0/tree /home/thomas/tmpdir/lm_word0/topo 'nnet-init /home/thomas/tmpd
ir/am_nnet0/recipe/exp/nnet/nnet.config -|' /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/0.mdl
2017-07-03 20:50:41,956 - DEBUG - nnet-init /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/nnet.config -
2017-07-03 20:50:41,960 - DEBUG - LOG (nnet-init:main():nnet-init.cc:71) Initialized raw neural net and wrote it to -
2017-07-03 20:50:41,962 - DEBUG - LOG (nnet-am-init:main():nnet-am-init.cc:103) Initialized neural net and wrote it to /home/thomas/tmpdir/am_nn
et0/recipe/exp/nnet/0.mdl
2017-07-03 20:50:41,963 - DEBUG - Training transition probabilities and setting priors
2017-07-03 20:50:45,119 - DEBUG - prepare vector assignment for FixedScaleComponent before softmax
2017-07-03 20:50:45,120 - DEBUG - (use priors^-0.25 and rescale to average 1)
2017-07-03 20:50:53,293 - DEBUG - queue.pl: 20 / 20 failed, log is in /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/log/acc_pdf.*.log
Most errors seem related to features writing starting with cmvn computations, but they did not seem to stop the program from running.
The logs in /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/log/acc_pdf.*.log
all look like this:
::::::::::::::
/home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/log/acc_pdf.14.log
::::::::::::::
# Running on puck1
# Started at Mon Jul 3 20:50:49 CEST 2017
# ali-to-post "ark:gunzip -c /home/thomas/tmpdir/am_trisa0/ali.14.gz|" ark:- | post-to-tacc --per-pdf=true --binary=false /home/thomas/tmpdir/am
_trisa0/final.mdl ark:- /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet/14.pacc
ali-to-post 'ark:gunzip -c /home/thomas/tmpdir/am_trisa0/ali.14.gz|' ark:-
From posteriors, compute transition-accumulators
The output is a vector of counts/soft-counts, indexed by transition-id)
Note: the model is only read in order to get the size of the vector
Usage: post-to-tacc [options] <model> <post-rspecifier> <accs>
e.g.: post-to-tacc --binary=false 1.mdl "ark:ali-to-post 1.ali|" 1.tacc
Options:
--binary : Write output in binary mode. (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
Command line was: post-to-tacc --per-pdf=true --binary=false /home/thomas/tmpdir/am_trisa0/final.mdl ark:- /home/thomas/tmpdir/am_nnet0/recipe/e
xp/nnet/14.pacc
ERROR (post-to-tacc:Read():parse-options.cc:375) Invalid option --per-pdf=true
ERROR (post-to-tacc:Read():parse-options.cc:375) Invalid option --per-pdf=true
[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::ParseOptions::Read(int, char const* const*)
post-to-tacc(main+0x112) [0x4cd52e]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x362c01ed1d]
post-to-tacc() [0x4cd339]
# Accounting: time=0 threads=1
# Finished at Mon Jul 3 20:50:49 CEST 2017 with status 255
Apparently the fatal error was caused by passing the unrecognized --per-pdf=true
option to the post-to-tacc
kaldi utility.
I'll try to run the test once again to see if this is reproducible.
Hi Thomas, actually I improved the test suite (using only 50 determined utterances from 4 speakers, instead of 1000 random). I also found and fixed few minor bugs...
By my side, all the tests are now passing, either on my desktop computer, or on the cluster (using run.pl or queue.pl).
So you rerun the tests, they should pass !
I just re-ran the test on a fresh install on Oberon and got the same error as previously for the nnet training test.
Specifically, I ran:
module load python-anaconda
conda create --name abkhazia2017 python=2 anaconda
source activate abkhazia2017
mkdir abkhazia2017
cd abkhazia2017
git clone https://github.com/bootphon/abkhazia.git
cd abkhazia
module load kaldi
KALDI_PATH=/home/mbernard/dev/abkhazia/kaldi ./configure
python setup.py build
pip install h5features --upgrade
python setup.py develop
I edited the abkhazia.conf
file as follows:
# This is the abkhazia configuration file. This file is automatically
# generated during installation. Change the values in here to overload
# the default configuration.
[abkhazia]
# The absolute path to the output data directory of abkhazia.
data-directory:
# The directory where abkhazia write temporary data (usually /tmp or
# /dev/shm).
tmp-directory: /tmp
[kaldi]
# The absolute path to the kaldi distribution directory
kaldi-directory: /home/mbernard/dev/abkhazia/kaldi
# "queue.pl" uses qsub. The options to it are options to qsub. If you
# have GridEngine installed, change this to a queue you have access
# to. Otherwise, use "run.pl", which will run jobs locally
# On Oberon use:
train-cmd: queue.pl -q all.q@puck*.cm.cluster
decode-cmd: queue.pl -q all.q@puck*.cm.cluster
highmem-cmd: queue.pl -q all.q@puck*.cm.cluster
# On Eddie use:
# train-cmd: queue.pl -P inf_hcrc_cstr_general
# decode-cmd: queue.pl -P inf_hcrc_cstr_general
# highmem-cmd: queue.pl -P inf_hcrc_cstr_general -pe memory-2G 2
# To run locally use:
# train-cmd: run.pl
# decode-cmd: run.pl
# highmem-cmd: run.pl
[corpus]
[thomas@oberon share]$ cat abkhazia.conf
# This is the abkhazia configuration file. This file is automatically
# generated during installation. Change the values in here to overload
# the default configuration.
[abkhazia]
# The absolute path to the output data directory of abkhazia.
data-directory:
# The directory where abkhazia write temporary data (usually /tmp or
# /dev/shm).
tmp-directory: /tmp
[kaldi]
# The absolute path to the kaldi distribution directory
kaldi-directory: /home/mbernard/dev/abkhazia/kaldi
# "queue.pl" uses qsub. The options to it are options to qsub. If you
# have GridEngine installed, change this to a queue you have access
# to. Otherwise, use "run.pl", which will run jobs locally
# On Oberon use:
train-cmd: queue.pl -q all.q@puck*.cm.cluster
decode-cmd: queue.pl -q all.q@puck*.cm.cluster
highmem-cmd: queue.pl -q all.q@puck*.cm.cluster
# On Eddie use:
# train-cmd: queue.pl -P inf_hcrc_cstr_general
# decode-cmd: queue.pl -P inf_hcrc_cstr_general
# highmem-cmd: queue.pl -P inf_hcrc_cstr_general -pe memory-2G 2
# To run locally use:
# train-cmd: run.pl
# decode-cmd: run.pl
# highmem-cmd: run.pl
[corpus]
# In this section you can specify the default input directory where to
# read raw data for each supported corpus. By doing so, the
# <input-dir> argument of 'abkhazia prepare <corpus>' becomes optional
# for the corpus you have specified directories here.
aic-directory:
buckeye-directory: /scratch1/data/raw_data/BUCKEYE_revised_bootphon
childes-directory:
cid-directory:
csj-directory:
globalphone-directory:
librispeech-directory:
wsj-directory:
xitsonga-directory:
Then ran the tests:
screen
module load python-anaconda
source activate abkhazia2017
pytest ./test --basetemp=/home/thomas/tmpdir -x -v
The pytest output is below.
pytest ./test --basetemp=/home/thomas/tmpdir -x -v
============================= test session starts ==============================
platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /home/thomas/.conda/envs/abkhazia2017/bin/python
cachedir: .cache
rootdir: /scratch1/users/thomas/abkhazia2017/abkhazia, inifile:
collected 52 items
test/test_acoustic.py::test_acoustic_njobs[4] PASSED
test/test_acoustic.py::test_monophone_cmvn_good PASSED
test/test_acoustic.py::test_monophone_cmvn_bad PASSED
test/test_align.py::test_align[both-False] PASSED
test/test_ark.py::test_read_write[text] PASSED
test/test_ark.py::test_read_write[binary] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a-b] PASSED
test/test_ark.py::test_h5f_name_of_utterance[a_b] PASSED
test/test_ark.py::test_h5f_twice PASSED
test/test_corpus.py::test_save_corpus[True] PASSED
test/test_corpus.py::test_save_corpus[False] PASSED
test/test_corpus.py::test_empty PASSED
test/test_corpus.py::test_subcorpus PASSED
test/test_corpus.py::test_split PASSED
test/test_corpus.py::test_split_tiny_train PASSED
test/test_corpus.py::test_split_by_speakers PASSED
test/test_corpus.py::test_split_and_save[True] PASSED
test/test_corpus.py::test_split_and_save[False] PASSED
test/test_corpus.py::test_split_less_than_1[True] PASSED
test/test_corpus.py::test_split_less_than_1[False] PASSED
test/test_corpus.py::test_spk2utt PASSED
test/test_corpus.py::test_phonemize_text PASSED
test/test_corpus.py::test_phonemize_corpus PASSED
test/test_decode.py::test_decode_mono[True] PASSED
test/test_decode.py::test_decode_mono[False] PASSED
test/test_decode.py::test_decode_tri[True] PASSED
test/test_decode.py::test_decode_tri[False] PASSED
test/test_decode.py::test_decode_trisa[True] PASSED
test/test_decode.py::test_decode_trisa[False] PASSED
test/test_decode.py::test_decode_nnet[True] ERROR
================================================================= ERRORS ==================================================================
________________________________________________ ERROR at setup of test_decode_nnet[True] _________________________________________________
corpus = <abkhazia.corpus.corpus.Corpus object at 0x2aaab52c4050>, features = '/home/thomas/tmpdir/features0'
am_trisa = '/home/thomas/tmpdir/am_trisa0', tmpdir_factory = <_pytest.tmpdir.TempdirFactory instance at 0x2aaada7fdb00>
lang_args = {'keep_tmp_dirs': True, 'level': 'word', 'position_dependent_phones': False, 'silence_probability': 0.5}
@pytest.fixture(scope='session')
def am_nnet(corpus, features, am_trisa, tmpdir_factory, lang_args):
output_dir = str(tmpdir_factory.mktemp('am_nnet'))
flog = os.path.join(output_dir, 'am_nnet.log')
log = utils.logger.get_log(flog)
am = acoustic.NeuralNetwork(
corpus, features, am_trisa, output_dir, lang_args, log=log)
am.options['num-epochs'].value = 2
am.options['num-epochs-extra'].value = 1
am.options['num-hidden-layers'].value = 1
am.options['num-iters-final'].value = 1
am.options['pnorm-input-dim'].value = 1
am.options['pnorm-output-dim'].value = 1
am.options['num-utts-subset'].value = 2
> am.compute()
test/conftest.py:246:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
abkhazia/abstract_recipe.py:185: in compute
self.run()
abkhazia/acoustic/neural_network.py:167: in run
self._train_pnorm_fast()
abkhazia/acoustic/neural_network.py:217: in _train_pnorm_fast
self._run_am_command(command, target, message)
abkhazia/acoustic/abstract_acoustic_model.py:170: in _run_am_command
self._run_command(command, verbose=False)
abkhazia/abstract_recipe.py:102: in _run_command
cwd=self.recipe_dir)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
command = 'steps/nnet2/train_pnorm_fast.sh --cmd "queue.pl -q all.q@puck*.cm.cluster --config /scratch1/users/thomas/abkhazia201.../acoustic /home/thomas/tmpdir/am_nnet0/lang /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet'
stdin = None, stdout = <bound method RootLogger.debug of <logging.RootLogger object at 0x2aaab51df7d0>>
cwd = '/home/thomas/tmpdir/am_nnet0/recipe'
env = {'SSH_ASKPASS': '/usr/libexec/openssh/gnome-ssh-askpass', 'MODULE_VERSION': '3.2.6', 'CUDA_ROOT': '/cm/local/apps/cuda...ka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'}
returncode = 0
def run(command, stdin=None, stdout=sys.stdout.write,
cwd=None, env=os.environ, returncode=0):
"""Run 'command' as a subprocess
command : string to be executed as a subprocess
stdout : standard output/error redirection function. By default
redirect the output to stdout, but you can redirect to a
logger with stdout=log.debug for exemple. Use
stdout=open(os.devnull, 'w').write to ignore the command
output.
stdin : standard input redirection, can be a file or any readable
stream.
cwd : current working directory for executing the command
env : current environment for executing the command
returncode : expected return code of the command
Returns silently if the command returned with `returncode`, else
raise a RuntimeError
"""
job = subprocess.Popen(
shlex.split(command),
stdin=stdin,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=cwd, env=env)
# join the command output to log (from
# https://stackoverflow.com/questions/35488927)
def consume_lines(pipe, consume):
with pipe:
# NOTE: workaround read-ahead bug
for line in iter(pipe.readline, b''):
consume(line)
consume('\n')
threading.Thread(
target=consume_lines,
args=[job.stdout, lambda line: stdout(line)]).start()
job.wait()
if job.returncode != returncode:
raise RuntimeError('command "{}" returned with {}'
> .format(command, job.returncode))
E RuntimeError: command "steps/nnet2/train_pnorm_fast.sh --cmd "queue.pl -q all.q@puck*.cm.cluster --config /scratch1/users/thomas/abkhazia2017/abkhazia/abkhazia/share/queue.conf" --num-hidden-layers 1 --presoftmax-prior-scale-power -0.25 --num-iters-final 1 --bias-stddev 0.5 --initial-learning-rate 0.04 --randprune 4.0 --target-multiplier 0 --minibatch-size 128 --num-epochs-extra 1 --shuffle-buffer-size 500 --final-learning-rate 0.004 --splice-width 4 --alpha 4.0 --pnorm-output-dim 1 --samples-per-iter 200000 --add-layers-period 2 --num-epochs 2 --p 2 --pnorm-input-dim 1 --mix-up 0 --io-opts "" --egs-opts "--num-utts-subset 2" --num-threads 3 --parallel-opts "--num-threads 3" --combine-num-threads 8 --combine-parallel-opts "--num-threads 8" /home/thomas/tmpdir/am_nnet0/recipe/data/acoustic /home/thomas/tmpdir/am_nnet0/lang /home/thomas/tmpdir/am_trisa0 /home/thomas/tmpdir/am_nnet0/recipe/exp/nnet" returned with 1
abkhazia/utils/jobs.py:73: RuntimeError
---------------------------------------------------------- Captured stdout setup ----------------------------------------------------------
asking 20 cores but reduced to 3
preparing lexicon in /home/thomas/tmpdir/am_nnet0/lang (L.fst)...
running "/home/mbernard/dev/abkhazia/kaldi/egs/wsj/s5/utils/prepare_lang.sh --position-dependent-phones false --sil-prob 0.5 /home/thomas/tmpdir/am_nnet0/lang/recipe/data/local/dict "<unk>" /home/thomas/tmpdir/am_nnet0/lang/local /home/thomas/tmpdir/am_nnet0/lang"
training neural network
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 30 passed, 1 error in 763.04 seconds ===================================================
Can somebody reproduce this ?
Hi Thomas,
I didn't reproduce your bug, for me all is ok. You are using --basetemp in your home but the partition is almost full, can you try in your scratch ?
Setting --basetmp in my scratch worked!
On Oberon (CentOS linux cluster) with an up-to-date install of abkhazia, the command
pytest ./test --basetemp=/home/thomas/tmpdir -vv -x
fails with:abkhazia conf file:
It looks like the command
abkhazia/align/align.py:167: in _align_fmllr self._target_dir()))
failed withOSError: [Errno 2] No such file or directory
.