cltl / morphosyntactic_parser_nl

Morphosyntactic parser for Dutch based on the Alpino parser
Apache License 2.0
5 stars 4 forks source link

Optimize Alpino call #14

Closed vanatteveldt closed 7 years ago

vanatteveldt commented 7 years ago

Instead of calling alpino -treebank for every sentence (xml file), call it once for all files and then parse the results per sentence. This reduces running time as the overhead cost of starting Alpino is pretty high, up to 50% for many short sentences.

I've tested this on a trivial and more complex example and both yielded identical results (except for running dates), but maybe more testing needs to be done [more unit tests would be good!]. I've also not tested with python2.

(I've also added some debug logging to check parsing times, I can remove these again if you want)

PaulHuygen commented 7 years ago

Thanks. It works. In a small ten-sentences long test-file it reduces processing-time with 10%.

Cheers, Paul