Closed larsmans closed 10 years ago
RWB (GitHub handle N/A) has expressed an interest in this issue.
As another benefit of doing this with a Unix domain socket, we can avoid the race condition in starting up the server. Currently we parse the SNER server's stderr
to see if it's ready to receive work, but when it reports "ready" it actually hasn't set up its socket yet; this is why there's a sleep
call in the SNER code. With a custom socket protocol, we can let the server send its ready message over that very socket.
It is not clear what will happen if multiple workers on a single host want to use Stanford NER at the same time. The port number is fixed, so we cannot run multiple SNER processes at the same time.
We could allocate the port quasi-dynamically, but that would require (1) a range of reserved ports and (2) communication between the workers about who's using which port.
Alternatively, we could wrap Stanford NER in a piece of Java/Jython code and communicate with that using some other means, e.g. a Unix domain socket or Pyro.