Instead of calling alpino -treebank for every sentence (xml file), call it once for all files and then parse the results per sentence.
This reduces running time as the overhead cost of starting Alpino is pretty high, up to 50% for many short sentences.
I've tested this on a trivial and more complex example and both yielded identical results (except for running dates), but maybe more testing needs to be done [more unit tests would be good!]. I've also not tested with python2.
(I've also added some debug logging to check parsing times, I can remove these again if you want)
Instead of calling alpino -treebank for every sentence (xml file), call it once for all files and then parse the results per sentence. This reduces running time as the overhead cost of starting Alpino is pretty high, up to 50% for many short sentences.
I've tested this on a trivial and more complex example and both yielded identical results (except for running dates), but maybe more testing needs to be done [more unit tests would be good!]. I've also not tested with python2.
(I've also added some debug logging to check parsing times, I can remove these again if you want)