Closed Pipboyguy closed 3 weeks ago
Name | Link |
---|---|
Latest commit | 9a347e63d95f8fa451190a42ed9c0f4f33fca769 |
Latest deploy log | https://app.netlify.com/sites/dlt-hub-docs/deploys/66d3421eb1e42d00088bb147 |
Deploy Preview | https://deploy-preview-1771--dlt-hub-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Description
OpenAI embedding service doesn't accept empty string bodies. We used to deal with this by overriding the whole
OpenAIEmbedding
function.This caused more grief than it fixed since the LanceDB registry doesn't keep track of it well, with very finicky Arrow metadata parsing and de-serialisation.
We simplify this fix by simply replacing empty strings with a placeholder that should be very semantically dissimilar to 99.9% of queries. Ideally, the null strings' embedding vectors themselves should be pinned at the origin, but this should be handled by upstream LanceDB.
The default vector column name is also changed to simply "vector" to coincide with LanceDB's default vector name to make onboarding and setup easier.
Related Issues
Additional Context
See https://github.com/lancedb/lancedb/issues/1577