clarin-eric / LRSwitchboard

DEPRECATED - Please see https://github.com/clarin-eric/switchboard for latest version - Code Repository for the Language Resources Switchboard of CLARIN
Other
1 stars 0 forks source link

[Tool] Colibri produces empty result #4

Closed twagoo closed 6 years ago

twagoo commented 7 years ago

When selecting Colibri under 'N-Gramming' in the LRS when accessed from the VLO with, for example, a publicly accessible Dutch plain text file, I can access the tool (after logging in with the CLARIN-PLUS credentials) and run it but the resulting files have no contents (i.e. the CSV file has 0 bytes).

The error.log file ends with

Counting 1-grams
None found
Sorting all indices...
Writing model to /scratch2/www/webservices-lst/live/writable/colibricore/projects/clarinplus/P00c032c6717acfb603622f4c229b98c142/output//1-00b8220da5e12b884adb952fe4cd1f6278.colibri.patternmodel
Generating desired views...
NPMI= -1.0 <class 'float'>
[CLAM Dispatcher] Process ended (2017-06-28 16:58:03, 0.551679s) 
[CLAM Dispatcher] Removing temporary files
[CLAM Dispatcher] Finished (2017-06-28 16:58:03), exit code 0, dispatcher wait time 0.5499999999999999s, duration 0.593168s
proycon commented 7 years ago

I'm the upstream colibri maintainer, I'll have to look into this...

claus-zinn commented 7 years ago

The error persists also for text files in other languages such as English and German. It seems that the file sent to Ucto arrives there (it can be seen and click upon in the Ucto UI), but its content is zero after the processing (in the zip file returned).

claus-zinn commented 6 years ago

The error still persists. On the error log, I get

ERROR: No corpus data file was specified (--datafile|-f), but this is required for the options you specified...

Note that FROG, another service from LST WebServices, is working properly. Also UCTO. So it seems specific to Colibro.

proycon commented 6 years ago

Sorry it took so long. It looks like the language parameter for the uploaded input file doesn't get set by the switchboard. It reads False (pass it with textinput_untok_language), see https://webservices-lst.science.ru.nl/colibricore/info/ under "Project Entry shortcut". Therefore, ucto fails and produces empty output because of that (but the error is not properly caught on our end) and colibri-core works on an empty file.

claus-zinn commented 6 years ago

My fault. I read the description too quickly. The parameter is fixed now. The new version propagated to production.