NVIDIA / nv-ingest

NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
Apache License 2.0
92 stars 42 forks source link

[FEA]: Add new SimpleMessageBroker and supporting elements to NV-ingest #226

Open drobison00 opened 1 week ago

drobison00 commented 1 week ago

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

Overview: Currently, the ingest pipeline relies on a message broker (Redis by default) to feed data, requiring the deployment of a message broker container and front-end REST service. For testing or proof of concept scenarios, it would be beneficial to have a more streamlined option that eliminates these dependencies.

Solution: Implement a simple inline message broker that can be used within the pipeline and create a corresponding client interface. This will allow the ingest service to run independently, without requiring local dependencies on an external message broker or REST service.

Describe the feature, and optionally a solution or implementation and any alternatives

Introduce socket_task_source and socket_task_sink components to the nv_ingest service. These will be configurable to listen on a specified source, allowing jobs to be accepted and results returned over sockets. Additionally, update nv_ingest_client with options to submit jobs and fetch job results from the nv_ingest service via these socket connections.

Additional context

No response