gchrupala / morfette

Supervised learning of morphology
BSD 2-Clause "Simplified" License
28 stars 5 forks source link

Workflow integration / Multi-document processing #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

First thank you for sharing all this nice work !

I'm working on a document workflow with thousands of documents.

We want to apply Morfette on each document processed by the workflow.

The mean processing speed has been slew down by morfette due to its time to 
launch.

We are trying to reduce the morfette forks to 1 :
- start the workflow, fork one morfette process
- for each document
    - do usual processing
    - send words to the forked morfette
    - read the results from the forked morfette
    - save data

Unfortunately, current version of morfette seems to wait the end of its stdin 
before processing the words.

Could you please change something ?

For example, if a word to process is the magic string "-{!#EOF#!}-", then 
morfette process its input buffer as usual, output the result as usual, reset 
its input buffer and continue to read stdin as usual until the real "feof on 
stdin" arrives...

This would restore our mean document processing speed to an acceptable value.

Thanks in advance for your contribution.

-- FS

Original issue reported on code.google.com by fsi...@gmail.com on 22 Sep 2010 at 12:51

GoogleCodeExporter commented 9 years ago
Actually, the current version of morfette should NOT wait for the end of input 
to start processing. Once the model is loaded, the input on stdin should be 
processed incrementally. 
What system are you using morfette on? Could you provide a small test case?

Original comment by pitekus on 23 Oct 2010 at 6:45

GoogleCodeExporter commented 9 years ago
Closing this issue due to lack of feedback.

Original comment by pitekus on 5 Nov 2010 at 4:21