dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
676 stars 49 forks source link

Add html and custom chunker #69

Closed dgarnitz closed 1 year ago

dgarnitz commented 1 year ago

What

Why

Requested by ArguFlow

Verification

Can see that the request to embed and upload an HTML file succeeds:

image

Can see in the pod logs here that the HTML converts to a string succesfully:

image

And can see the custom chunker being called:

image

Unit tests all still pass