Closed peterdekker closed 4 years ago
This seems very similar to #52 , both fail on FoLiA-correct because of missing output
I ran the same file, in a new installation, the same error occurs: https://pastebin.ubuntu.com/p/26Xfk4jRcD/
I wasn't sure of the status of this issue, there have been various fixes in the meantime, so I again checked; unfortunately, this issue is indeed still relevant as the indexer yields no results:
executor > local (5)
[bf/8f54b6] process > txt2folia [100%] 1 of 1 ✔
[23/7a5574] process > corpusfrequency [100%] 1 of 1 ✔
[34/7e7fc3] process > ticclunk [100%] 1 of 1 ✔
[fd/9bcaa7] process > anahash [100%] 1 of 1 ✔
[19/103baa] process > indexer [100%] 1 of 1, failed: 1 ✘
[- ] process > resolver -
[- ] process > rank -
[- ] process > chainer -
[- ] process > foliacorrect -
Error executing process > 'indexer (1)'
Caused by:
Process `indexer (1)` terminated with an error exit status (6)
Command executed:
#!/bin/bash
set +u
if [ ! -z "/var/www/lamachine2/weblamachine" ]; then
source /var/www/lamachine2/weblamachine/bin/activate
fi
set -u
TICCL-indexerNT --hash "corpus.wordfreqlist.tsv.clean.anahash" --charconf "confusion.lst" --foci "corpus.wordfreqlist.tsv.clean.corpusfoci" -o "corpus.wordfreqlist.tsv.clean" -t 56 --low 5 --high 35 || exit 1
if [ ! -s "corpus.wordfreqlist.tsv.clean.indexNT" ]; then
echo "ERROR: Expected output corpus.wordfreqlist.tsv.clean.indexNT does not exist or is empty">&2
exit 6
fi
Command exit status:
6
Command output:
Now using node v13.13.0 (npm v6.14.4)
reading corpus word anagram hash values
read 206669 corpus word anagram values
skipped 2424 out-of-band corpus word values
read 1 foci values
read 275652 character confusion anagram values
created 1 separate experiments
running on 1 threads.
wrote indexes into: corpus.wordfreqlist.tsv.clean.indexNT
Command error:
ERROR: Expected output corpus.wordfreqlist.tsv.clean.indexNT does not exist or is empty
I'm unassigning myself though (this is not something I can maintain or solve if it's caused by a deeper issue in ticcltools). If it's a deemed a pipeline problem and there's a viable solution proposed to it then I can help again.
This was a non-issue.
Within TICCL a minimum word length is set. All words in this 'input files' are at most three characters. Also, they are all very common and correctly spelled words and present in even the most basic TICCL lexicon e.g. the Aspell lexicon. So there is nothing here for TICCL to work on.
If you want to see TICCL work properly, feed it a proper text, please. To see it work well, feed it either a large corpus to process or give it a large lexicon and name list, or do both.
When processing a simple text file (with just two Dutch sentences) through the PICCL webinterface, TICCL fails. This is the error.log: https://pastebin.ubuntu.com/p/mmV9sqk8n2/ Input file:
For our deployment, this is not a priority, but just filing it here to let you know!