dust-tt / dust

Amplify your team's potential with customizable and secure AI assistants.
https://dust.tt
MIT License
897 stars 99 forks source link

Refactor file upload for content fragment #5953

Open flvndvd opened 4 days ago

flvndvd commented 4 days ago

Description

Summary

While working on vision, we identified a race condition in the process of uploading content fragment files. Currently, we create the conversation/post the message with the content fragment before uploading the associated file, which causes issues. As we plan to use URLs for vision and need to resize images (increasing upload time), the existing logic needs an refactoring.

Key Changes

This PR revamps the current logic, separating files from content fragments. While files will still be used within content fragments, they can now exist independently. The goal is to achieve this separation without storing every file in a database. The main changes include:

  1. New Endpoint /files: This endpoint must be called first, allowing the client to obtain a file ID. The endpoint returns the file ID and a signed (using JWT) upload URL valid for 30 seconds.
  2. New Endpoint /files/:fileId: This endpoint supports both file upload and retrieval. For uploads, the client must provide the token to verify that the file ID was generated by our system. While it currently doesn't process textual files, it does resize images. Once the file is uploaded, the endpoint returns a download URL, which can be used in the url field of a content fragment.

The useFileUploaderService has been adjusted to accommodate these changes. Previously, no special handling was done until files were submitted. With the new approach, files are uploaded to our cloud storage as soon as they are uploaded to the system, unblocking the send button only after the upload and resizing are complete. This avoids impacting the chat experience.

Impact on Production

Trade-offs and Future Considerations

This solution has some trade-offs, such as the inability to recompute file IDs and the loss of the current GCS structure of conversation/message/content. There is also the potential for "ghost" files to remain in the system if they are added to the input bar but never sent. However, these drawbacks are acceptable given the increased flexibility needed for future file handling enhancements.

Note on Public API

These changes do not yet apply to the public API but will be rolled out once vision becomes generally available.

Image Resizing

Based on experiments with vision, we plan to resize the largest side of images down to 768px.

Follow up work

Demos

ImageUpload

PDFUpload

Risk

Worst case, file upload is broken.

⚠️ Rolling back is not fully safe, as it will disrupt the file downloads for files created while this code was in production.

Deploy Plan