Open masci opened 4 months ago
I confirm this issue.
I have an indexing pipeline. The pipeline accepts documents and then indexes them into an ElasticSearch database.
I created a test pipeline with a single DocumentCleaner that expects a list of documents and I have the following two problems:
pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'pandas.core.frame.DataFrame'>)
TypeError: DocumentCleaner expects a List of Documents as input.
At the moment it's extremely hard to use Hayhooks for indexing pipelines, as they either accept:
Since Hayhooks is in control of the request payload for pipeline endpoints, one possible solution might be accepting multipart form data whenever the input of a pipeline is of type path or bytestream. Hayhooks would receive the file and take care of temporarily storing it server-side, or passing bytes on-the-fly to the pipeline.