NVIDIA / nv-ingest

NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
Apache License 2.0
87 stars 40 forks source link

[FEA]: Add the ability for the CLI and client library to extract image content from metadata to disk and replace with URL in metadata file #214

Closed drobison00 closed 2 weeks ago

drobison00 commented 2 weeks ago

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Would be nice

Please provide a clear description of problem this feature solves

This is a quality of life update that can help with making metadata files more human readable when required.

Describe the feature, and optionally a solution or implementation and any alternatives

Requires modifying the CLI and client library so that it supports an option to extract images and retain a link to them on disk.

Additional context

No response