NLeSC / xtas

Distributed text analysis suite based on Celery
http://nlesc.github.io/xtas/
Other
94 stars 32 forks source link

Multiple workers calling Stanford NER #31

Closed larsmans closed 10 years ago

larsmans commented 10 years ago

It is not clear what will happen if multiple workers on a single host want to use Stanford NER at the same time. The port number is fixed, so we cannot run multiple SNER processes at the same time.

We could allocate the port quasi-dynamically, but that would require (1) a range of reserved ports and (2) communication between the workers about who's using which port.

Alternatively, we could wrap Stanford NER in a piece of Java/Jython code and communicate with that using some other means, e.g. a Unix domain socket or Pyro.

larsmans commented 10 years ago

Here's how the Java/Jython code should communicate with Stanford.

larsmans commented 10 years ago

RWB (GitHub handle N/A) has expressed an interest in this issue.

larsmans commented 10 years ago

As another benefit of doing this with a Unix domain socket, we can avoid the race condition in starting up the server. Currently we parse the SNER server's stderr to see if it's ready to receive work, but when it reports "ready" it actually hasn't set up its socket yet; this is why there's a sleep call in the SNER code. With a custom socket protocol, we can let the server send its ready message over that very socket.