We've significantly improved our file coverage in the input bar, but the data source file upload has been neglected for some time. A few months ago, we introduced the file API, which we can use to:
Enhance our file coverage (currently, we use a basic and unreliable version of pdfjs on the client side).
Improve chunking (the current algorithm does not maintain the hierarchy that could be extracted from documents, such as PDFs).
This can be broken down into several steps:
Step 1
Refactor the UI for document upload to align with the new design in Figma. Introduce a new use case for the file API concerning data source documents.
Step 2
Determine how we should handle structured data files (e.g., CSVs). Currently, they can be uploaded as both documents and tables.
Step 3
For the new use case in the file API, implement a chunking method that utilizes the extracted hierarchy.
See figma design here
We've significantly improved our file coverage in the input bar, but the data source file upload has been neglected for some time. A few months ago, we introduced the file API, which we can use to:
This can be broken down into several steps:
Step 1
Refactor the UI for document upload to align with the new design in Figma. Introduce a new use case for the file API concerning data source documents.
Step 2
Determine how we should handle structured data files (e.g., CSVs). Currently, they can be uploaded as both documents and tables.
Step 3
For the new use case in the file API, implement a chunking method that utilizes the extracted hierarchy.
Related
https://github.com/dust-tt/dust/issues/6899