ICIJ / datashare

A self-hosted search engine for documents.
https://datashare.icij.org
GNU Affero General Public License v3.0
598 stars 54 forks source link

feat: extract create embedded docs cache at INDEX stage #1533

Closed bamthomas closed 1 week ago

bamthomas commented 2 months ago

Is your feature request related to a problem? Please describe.

Avoid errors when accessing embedded files.

Describe the solution you'd like

When we index files with a provided artifactDir option then index stage is extracting all embedded files on disk.

see #1397

bamthomas commented 2 months ago

For now we are using in memory documents when indexing them. it seems in this thread that we cannot stream the data to ES.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 40 days with no activity.