[Visualization] renderedContentFragment for csv render only a preview of the file

lasryaric commented 4 months ago

~~Design doc here.~~

~~Since the code interpreter action can potentially output some new files, we need a way to store them in the conversation.~~ ~~We need to distinguish this type of file to present them differently to the user as "downloadable", and not render them to the main model.~~

~~The proposed solution is to add a ContentFragmentType.type=action_output.~~

Updated task:

We want to allow uploading CSV as a first step (no postprocessing other than validating it). Then we want to update renderConversationForModel to properly render the ContentFragment given their type and size:

Small CSV -> header + embed content
Large CSV -> header + a few lines
Other file types -> do not change current behavior.

PopDaph commented 4 months ago

Deprecated since we're focusing on Visualization first. Closing!

spolu commented 4 months ago

We will still need it to some extent to upload CSVs without presenting them to the model right?

PopDaph commented 4 months ago

Yes, this is https://github.com/dust-tt/dust/issues/5780!

spolu commented 4 months ago

\o/

spolu commented 4 months ago

But it does not reaaaallly cover adding CSVs to converstaion?

lasryaric commented 4 months ago

@spolu can you unpack?

spolu commented 4 months ago

Let say I want to visualize a CSV. I need to upload a CSV (I can't today) and I need the CSV to not be rendered to the model conversation ?

lasryaric commented 4 months ago

So in the design doc we never accounted for CSV that owuld not be rendered to the model (edited). We assumed it would be the same way it is today (with a "preview" of the csv for the dust-v2-code-interpreter app.

spolu commented 4 months ago

I think we need to account for that since we can't upload even a 2mb CSV as it will blow the token count

PopDaph commented 4 months ago

Ok reviving this one, that potentially needs to be updated since for Visualization it's not about the action producing files anymore. Let's quickly frame what's MVP for Visualization regarding file uploaded as content fragment.

Context

Currently, ContentFragments are rendered in renderConversationForModel as a user message. We limit the size of uploaded documents to avoid exceeding the model's Context Size. Example of current rendering:

{
  "role": "user",
  "name": "Daphné Popin",
  "content": [
    {
      "type": "text",
      "text": "<attachment type=\"text/csv\" title=\"SampleCSVFile_2kb.csv\">\n1,\"Eldon Base for stackable storage shelf, platinum\",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8\r\n2,\"1.7 Cubic Foot Compact \"\"Cube\"\" Office Refrigerators\",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58\r\n3,\"Cardinal Slant-D� Ring Binder, Heavy Gauge Vinyl\",Barry French,293,46.71,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39\r\n4,R380,Clay Rozendal,483,1198.97,195.99,3.99,Nunavut,Telephones and Communication,0.58\r\n5,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5\r\n6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37\r\n7,\"Angle-D Binders with Locking Rings, Label Holders\",Carl Jackson,613,-54.04,7.3,7.72,Nunavut,Binders and Binder Accessories,0.38\r\n8,\"SAFCO Mobile Desk Side File, Wire Frame\",Carl Jackson,613,127.70,42.76,6.22,Nunavut,Storage & Organization,\r\n9,\"SAFCO Commercial Wire Shelving, Black\",Monica Federle,643,-695.26,138.14,35,Nunavut,Storage & Organization,\r\n10,Xerox 198,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38\r\n\n</attachment>"
    },
    {
      "type": "text",
      "text": "\n@dust what's in the data? "
    }
  ]
},

We aim to handle larger files for Visualization, moving beyond the current file size limitations.

Proposal for MVP

Key Requirements:

Model Access: Provide a preview of large files for the model to generate visualizations.
Full Content Access: Enable the visualization iframe to securely access the full file content.

Suggested Steps:

Update renderConversationForModel:
- For small files: Render completely (current method)
- For large files: Render only a preview
Increase maximum upload size limit (implement under a feature flag initially)
Develop API route and SWR logic:
- Create endpoint to load full content of large ContentFragments
- Implement front-end logic to access this content for conversation rendering
Collaboration: Sync with @lasryaric (returning Wednesday) on iframe parameter passing for full content access

Next Actions

Implement proposed changes
Test with various file sizes
Review and refine with @lasryaric

@spolu WDYT?

spolu commented 4 months ago

Implement front-end logic to access this content for conversation rendering

What do you mean by this? model conversation rendering or something else?

(Note: this is somewhat tied with this project https://www.notion.so/dust-tt/Attachments-DataSource-Actions-b1ee705c834f4f1e997bc6c48c08a9b2?pvs=4)

The new file API endpoints allow us to post-process files after they are uploaded + we now store content-type on the content fragments.

This likely means here that we simply want a special handling in renderConversationForModel based on content-type and size of the file.

As part of this project we likely simply want to allow uploading CSV as a first step (no postprocessing other than validating it) and maybe at a later point XLS (with post processing to CSV). As part of the renderConcversationForModel we likely want to simply change the rendering of the content fragment dynamically based on the content fragment content type (and size indeed) information.

CSV (very small) => header + embed CSV (large) => header + a few lines Anything else => nothing changes for now

?

spolu commented 4 months ago

The one thing we want to try to put a bit of effort in is the code interfaces to do this dynamic rendering to make it a bit future proof but that's not even a real concern at this stage IMHO.

PopDaph commented 4 months ago

Implement front-end logic to access this content for conversation rendering What do you mean by this? model conversation rendering or something else?

I mean that we need to pass the full content of the CSV file to the iframe rendering the graph. The model only requires a subset of the document to generate the React component, but when rendering the React component in the iframe, it needs access to the full data to render it properly.

spolu commented 4 months ago

Right! This task and the way we present files to the visualisation Ifrane can almost be two separate tasks 👍

PopDaph commented 4 months ago

Ok perfect thanks a lot 🙏🏻

lasryaric commented 4 months ago

I agree on the conclusion. Thanks.

PopDaph commented 4 months ago

https://github.com/dust-tt/dust/pull/6165

dust-tt / dust