Closed ClemDoum closed 2 weeks ago
Implemenent batch text processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.
datashare-api
List<List<NlpTag>> processText(Stream<String> batch, Language language) throws InterruptedException
Pipeline
NlpTag
datashare-core-nlp
datashare-app
bool Pipeline.Type.extractFromDoc()
ExtractNlpTask
Closed in favor of #1597
TODO
PR description
Implemenent batch text processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.
Changes
datashare-api
⚠️Added
List<List<NlpTag>> processText(Stream<String> batch, Language language) throws InterruptedException
toPipeline
Changed
NlpTag
a record and json serializable classdatashare-core-nlp
Added
datashare-app
Added
bool Pipeline.Type.extractFromDoc()
which indicates if the pipeline should preferrably used on full documents or can be used on text chunksExtractNlpTask
for pipelines which do not require prediction on documents