LanguageMachines / PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation
Other
48 stars 6 forks source link

Output does not show up for download when only OCR is enabled #40

Closed JessedeDoes closed 5 years ago

JessedeDoes commented 6 years ago

image

Apparently, the OCR worked fine, and I do get output when I specify further processing steps.

Log file:

[CLAM Dispatcher] Adding to PYTHONPATH: /var/www/lamachine2/weblamachine/lib/python3.5/site-packages/PICCL-0.6.2-py3.5.egg/picclservice
[CLAM Dispatcher] Started CLAM Dispatcher v2.3.3 with picclservice.picclservice (2018-07-22 21:35:56)
[CLAM Dispatcher] Running /var/www/lamachine2/weblamachine/bin/python "/var/www/lamachine2/weblamachine/lib/python3.5/site-packages/PICCL-0.6.2-py3.5.egg/picclservice/picclservice_wrapper.py" "/var/www/webservices-lst/live/writable/piccl/projects/jessededoes/mySixth/clam.xml" "/var/www/webservices-lst/live/writable/piccl/projects/jessededoes/mySixth/.status" "/var/www/webservices-lst/live/writable/piccl/projects/jessededoes/mySixth/input/" "/var/www/webservices-lst/live/writable/piccl/projects/jessededoes/mySixth/output/" "/var/www/lamachine2/weblamachine/opt/PICCL" "/var/www/lamachine2/weblamachine/opt/PICCL"
[CLAM Dispatcher] Running with pid 123846 (2018-07-22 21:35:56)
Running PICCL from /var/www/lamachine2/weblamachine/opt/PICCL/
System default encoding:  utf-8
Forcing en_US.UTF-8 locale...
Command: /var/www/lamachine2/weblamachine/opt/PICCL/ocr.nf --inputdir "/var/www/webservices-lst/live/writable/piccl/projects/jessededoes/mySixth/input/" --outputdir ocr_output --inputtype "tif" --language "nld" -with-trace >ocr.nextflow.out.log 2>ocr.nextflow.err.log
[ocr] Nextflow standard error output
-------------------------------------------------

[ocr] Nextflow standard output
-------------------------------------------------
N E X T F L O W  ~  version 0.30.1
Launching `/var/www/lamachine2/weblamachine/opt/PICCL/ocr.nf` [compassionate_woese] - revision: 76d7839f83
WARN: The config file defines settings for an unknown process: indexer
--------------------------
OCR Pipeline
--------------------------
[warm up] executor > local
[f9/156e4c] Submitted process > tesseract (1)
[0b/41b5b8] Submitted process > ocrpages_to_foliapages (1)
[c1/817202] Submitted process > foliacat (1)
OCR output document written to ocr_output/1936.tiff.folia.xml

[ocr] Nextflow trace summary
-------------------------------------------------
task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    rss vmem    rchar   wchar
1   f9/156e4c   124008  tesseract (1)   COMPLETED   0   2018-07-22 21:35:58.605 10.8s   10s 102.1%  75.9 MB 130.7 MB    17 MB   64 B
2   0b/41b5b8   124286  ocrpages_to_foliapages (1)  COMPLETED   0   2018-07-22 21:36:09.464 5.6s    1.3s    10.8%   12.6 MB 52 MB   534.4 KB    26.8 KB
3   c1/817202   125005  foliacat (1)    COMPLETED   0   2018-07-22 21:36:15.085 5.6s    1.3s    9.6%    12.5 MB 52 MB   534.6 KB    26.8 KB

TICCL skipped as requested...
[CLAM Dispatcher] Process ended (2018-07-22 21:36:21, 25.029313s) 
[CLAM Dispatcher] Removing temporary files
[CLAM Dispatcher] Finished (2018-07-22 21:36:21), exit code 0, dispatcher wait time 25.0s, duration 25.029828s
proycon commented 6 years ago

The above fix should hopefully solve this issue (pending testing and release)

proycon commented 5 years ago

Ok, this is now finally fixed after testing (sorry for the delay). I might change some more things and then do a release.