dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

spacy #847

Closed henningfemmer closed 8 years ago

henningfemmer commented 8 years ago

Have you guys looked at spaCy? (https://spacy.io) Is there any chance this is going to be integrated into DKPRO? Although I'm unsure whether it supports Apache UIMA...

Cheers and thanks, Henning

reckart commented 8 years ago

The main problem is that is written in Python. Not that it would be a particular problem integrating Python-based components with UIMA - it could be done in the same way that we e.g. incorporate TreeTagger, namely calling it as an external process. All packages that we currently incorporate with DKPro Core can be conveniently deployed through JARs, either being Java code or being statically linked binaries that are shipped in a JAR and extracted to disk at runtime. However, Python-based software packages usually have many dependencies that cannot be easily packaged up (e.g. in a JAR) and deployed through Maven. If we had a good solution for packaging and deploying packages such as spacy, we would be more interested in incorporating them - at least form our side.

I would say we would welcome contributions of wrappers even if the underlying software is not easily packageable as as JARs. We might not include them in releases though until a proper way of packaging has been found - or we may choose to mark them in a specific way such that users know that they need to invest manual effort to use them. From my perspective, that (or alternative approaches) would be open to discussion.

carschno commented 8 years ago

It would be interesting though which NLP components SpaCy actually uses/implements (e.g. which parser specifically). Possibly, DKPro Core already has some of these or equivalent components.