Closed martinreynaert closed 6 years ago
amek = make
I can add this as input source but I'm not entirely sure whether it won't trip of the file names. Can you try outside the webservice first? Will plug it in as an input source when it behaves according to expectation.
(default changed and updated on ponyland, please test outside webservice first)
Something did not work as planned, I am left clueless.
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ nextflow run LanguageMachines/PICCL/ocr.nf --inputdir /vol/tensusers/mreynaert/DPO35/TIF/ --language nld DPO35tiff.OCR.20180205.stdout 2>DPO35tiff.OCR.20180205.stderr
N E X T F L O W ~ version 0.26.4
Launching `LanguageMachines/PICCL` [peaceful_boyd] - revision: f1d6be93b1 [master]
WARN: The config file defines settings for an unknown process: indexer
WARN: The config file defines settings for an unknown process: resolver
WARN: The config file defines settings for an unknown process: rank
WARN: The config file defines settings for an unknown process: foliacorrect -- Did you mean: foliacat?
WARN: The config file defines settings for an unknown process: frog_original
WARN: The config file defines settings for an unknown process: modernize
WARN: The config file defines settings for an unknown process: frog_modernized
--------------------------
OCR Pipeline
--------------------------
[warm up] executor > local
WARN: The `into` operator should be used to connect two or more target channels -- consider to replace it with `.set { pageimages_bitmap }`
WARN: The `into` operator should be used to connect two or more target channels -- consider to replace it with `.set { groupfoliapages }`
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ ls -l
total 1273628
-rw-rw-r-- 1 mreynaert mreynaert 0 Feb 5 17:28 DPO35tiff.OCR.20180205.stderr
-rw-rw-r-- 1 mreynaert mreynaert 1304172529 Feb 5 16:07 DPO35tiff.tar.gz
drwxrwxr-x 2 mreynaert mreynaert 16384 Feb 5 15:57 TIF
drwxrwxr-x 2 mreynaert mreynaert 10 Feb 5 17:29 work
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ ls -l work/
total 0
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$
I should obviously have specified --inputtype tif
Get an error:
[a6/a8a48d] Submitted process > foliacat (23)
[2b/e0e7cc] Submitted process > foliacat (25)
ERROR ~ Error executing process > 'foliacat (1)'
Caused by:
Process `foliacat (1)` terminated with an error exit status (1)
Command executed:
set +u
if [ ! -z "/vol/customopt/lamachine16" ]; then
source /vol/customopt/lamachine16/bin/activate
fi
set -u
if [ -f .tif.folia.xml ]; then
#only one file, nothing to cat
cp $foliainput dpo_35_0120_master.folia.xml
else
foliainput=$(ls -1v *.tif.folia.xml)
foliacat -i dpo_35_0120_master -o dpo_35_0120_master.folia.xml $foliainput
fi
Command exit status:
1
Command output:
==============================================================================
, LaMachine - NLP Software distribution
~) (https://proycon.github.io/LaMachine)
(----í Language Machines research group
/| |\ & Centre for Language and Speech Technology
/ / /| Radboud University Nijmegen
==============================================================================
Available software: CLAM (clamservice), Colibri Core (colibri-patternmodeller),
FoLiA Tools (foliavalidator, folia2txt, folia2html, foliaquery etc),
foliadocserve, foliautils (folialint etc),
frog, gecco, mbt, mbtserver, ticcltools, timbl, toad (froggen),
ucto, wopr
Python libraries: pynlpl ucto frog timbl clam colibricore
Run lamachine-test.sh to test your installation, run lamachine-update.sh to
update everything (with sudo only if you use Vagrant or Docker).
(Set LAMACHINE_QUIET=1 prior to activation to suppress this message)
Command error:
.command.sh: line 10: foliainput: unbound variable
Work dir:
/vol/tensusers/mreynaert/DPO35/work/03/842ed0d7835ae9ebe34f728b1909c1
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
[88/66c19c] Submitted process > foliacat (27)
[49/d8f766] Submitted process > foliacat (59)
WARN: Killing pending tasks (19)
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$
Here's where .nextflow.log is at:
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ cat .nextflow.log
Do not understand what it says.
As mentioned before, in ponyland LaMachine, run the scripts directly instead of prefixed with nextflow run
. You're running an older cached version by nextflow, you should be able to just have ocr.nf
etc in your path.
That does not seem to work:
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ ocr.nf --inputtype tif --inputdir /vol/tensusers/mreynaert/DPO35/TIF/ --language nld DPO35tiff.OCR.20180205.BIS.stdout 2>DPO35tiff.OCR.20180205.BIS.stderr (lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ cat DPO35tiff.OCR.20180205.BIS.stderr ocr.nf: command not found
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ LanguageMachines/PICCL/ocr.nf --inputtype tif --inputdir /vol/tensusers/mreynaert/DPO35/TIF/ --language nld DPO35tiff.OCR.20180205.stdout 2>DPO35tiff.OCR.20180205.stderr (lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ cat DPO35tiff.OCR.20180205.BIS.stderr ocr.nf: command not found
Probably got lost in the server upgrade, I fixed it again now
Did you upgrade the system? I see no difference, so far:
(lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ LanguageMachines/PICCL/ocr.nf --inputtype tif --inputdir /vol/tensusers/mreynaert/DPO35/TIF/ --language nld DPO35tiff.OCR.20180205.stdout 2>DPO35tiff.OCR.20180205.stderr (lamachine16)[mreynaert@scootaloo:/vol/tensusers/mreynaert/DPO35]$ cat DPO35tiff.OCR.20180205.stderr -bash: LanguageMachines/PICCL/ocr.nf: No such file or directory
I should have been more explicit I guess, it's just ocr.nf
:)
Heeft gewerkt, proycon! Thanks!
So that means you want the book included in the webservice right?
I got stuck in testing the web version due to the *master.tif extension of the test book. Can I access the available corpora on ponyland to rename these files or can you please do this for me?
Ok, so the conclusion is that we strip the suffixes and adhere to the simple naming convention?
I updated the corpus available for the webservice. All other data is in your download file (see download.nf
) so within your control.
Closing this, issues should be resolved, reopen if test fails
Please amek available the following test book version in the PICCL work flow:
[mreynaert@scootaloo:~]$ ls -l /vol/tensusers/mreynaert/DPO35tiff.tar.gz -rw-rw-r-- 1 mreynaert mreynaert 1304172529 Feb 5 16:07 /vol/tensusers/mreynaert/DPO35tiff.tar.gz