Closed henningfemmer closed 8 years ago
The main problem is that is written in Python. Not that it would be a particular problem integrating Python-based components with UIMA - it could be done in the same way that we e.g. incorporate TreeTagger, namely calling it as an external process. All packages that we currently incorporate with DKPro Core can be conveniently deployed through JARs, either being Java code or being statically linked binaries that are shipped in a JAR and extracted to disk at runtime. However, Python-based software packages usually have many dependencies that cannot be easily packaged up (e.g. in a JAR) and deployed through Maven. If we had a good solution for packaging and deploying packages such as spacy, we would be more interested in incorporating them - at least form our side.
I would say we would welcome contributions of wrappers even if the underlying software is not easily packageable as as JARs. We might not include them in releases though until a proper way of packaging has been found - or we may choose to mark them in a specific way such that users know that they need to invest manual effort to use them. From my perspective, that (or alternative approaches) would be open to discussion.
It would be interesting though which NLP components SpaCy actually uses/implements (e.g. which parser specifically). Possibly, DKPro Core already has some of these or equivalent components.
Have you guys looked at spaCy? (https://spacy.io) Is there any chance this is going to be integrated into DKPRO? Although I'm unsure whether it supports Apache UIMA...
Cheers and thanks, Henning