ICIJ / datashare

A self-hosted search engine for documents.
https://datashare.icij.org
GNU Affero General Public License v3.0
598 stars 54 forks source link

feature: implement `CreateNlpBatchesFromIndexTask` and `BatchNlpTask` #1597

Open ClemDoum opened 1 month ago

ClemDoum commented 1 month ago

TODO

PR description

Implemenent batch processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.

Notes

In this PR we made the choice not to implement PipelineTask but in contrast fully rely on the task bus to distribute the batches across workers

Changes

datashare-api

Added

datashare-app

Added