KBNLresearch / ochre

Toolbox for OCR post-correction
Apache License 2.0
123 stars 18 forks source link

Error during preprocessing #10

Open ycsun19 opened 6 years ago

ycsun19 commented 6 years ago

Hi, I am just trying to run this project and encountered several problems in the preprocessing part. I am new to CWL, so my questions may be quite basic, thanks for help in advance:

  1. Since vudnc-preprocess.cwl can be run as stand-alone, how should I run it. Please give me more detailed instructions.

  2. When running the first cell of vudnc-preprocess-workflow.ipynb, it gives the following error:

ValueError Traceback (most recent call last)

in () 13 14 changes_files, metadata_files = wf.align(file1=ocr, file2=gs, scatter=['file1', 'file2'], scatter_method='dotproduct') ---> 15 metadata = wf.merge_json(in_files=metadata_files, name=align_metadata) 16 changes = wf.merge_json(in_files=changes_files, name=align_changes) 17 (Something omitted here) ValueError: "merge-json" not found in steps library. Please check your spelling or load additional steps

So, how can I solve this error?

  1. When running the first cell of , I made no modification except setting working_dir='/home/ycsun/ochre/123/' , then it give the following warnings:

WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/align-dir.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/sac-preprocess.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/ocrevaluation-performance-wf-pack.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/icdar2017st-extract-data-all.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/word-mapping-dir.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/word-mapping-test-files-wf.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/ocrevaluation-performance-test-files-wf-pack.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/align-test-files.cwl", because it is a packed workflow. WARNING:scriptcwl.library:Not loading "/home/ycsun/ochre/123/kb-tss-preprocess-all.cwl", because it is a packed workflow. WARNING:scriptcwl.library:../123/icdar2017st-extract-data.cwl:21:1: checking field steps ../123/icdar2017st-extract-data.cwl:28:3: checking object ../123/icdar2017st-extract-data_.cwl#icdar2017st-extract-text-1 ../123/icdar2017st-extract-data.cwl:29:5: Field run contains undefined reference to file:///home/ycsun/icdar2017st-extract-text.cwl ../123/icdar2017st-extract-data.cwl:22:3: checking object ../123/icdar2017st-extract-data_.cwl#ls-2 ../123/icdar2017st-extract-data.cwl:23:5: Field run contains undefined reference to file:///home/ycsun/ls.cwl ../123/icdar2017st-extract-data.cwl:39:3: checking object ../123/icdar2017st-extract-data_.cwl#save-files-to-dir ../123/icdar2017st-extract-data.cwl:40:5: Field run contains undefined reference to file:///home/ycsun/save-files-to-dir.cwl ../123/icdar2017st-extract-data.cwl:46:3: checking object ../123/icdar2017st-extract-data_.cwl#save-files-to-dir-5 ../123/icdar2017st-extract-data.cwl:47:5: Field run contains undefined reference to file:///home/ycsun/save-files-to-dir.cwl ../123/icdar2017st-extract-data.cwl:53:3: checking object ../123/icdar2017st-extract-data_.cwl#save-files-to-dir-9 ../123/icdar2017st-extract-data_.cwl:54:5: Field run contains undefined reference to file:///home/ycsun/save-files-to-dir.cwl

In order to process the ICDAR 2017 dataset, what modifications should I make in the corresponding files?

jvdzwaan commented 6 years ago
  1. Thank you for your remark, I added some documentation
  2. There is no need to regenerate this workflow, and I also see that the notebook is outdated. Newer version will follow.
  3. Did you install the development version of nlppln? It seems you are missing some required steps.

Please note that the software is experimental and that reported performance on the ICDAR 2017 dataset is low, see #7

Mohit-soni-bhagwan commented 3 years ago

I was facing error while running ICDAR2017_shared_task_workflows.ipynb in the 13th line: wf.ls(in_dir=in_dir) there is no preprocess workflow file for ICDAR2017 dataset, what should be done to remove this error: ' "None" not found in steps library. Please check your spelling or load additional steps' I don't know much about CWL, is there some way around to run this program. Please help me with the documentation for the same