Closed lasryaric closed 4 months ago
Deprecated since we're focusing on Visualization first. Closing!
We will still need it to some extent to upload CSVs without presenting them to the model right?
Yes, this is https://github.com/dust-tt/dust/issues/5780!
\o/
But it does not reaaaallly cover adding CSVs to converstaion?
@spolu can you unpack?
Let say I want to visualize a CSV. I need to upload a CSV (I can't today) and I need the CSV to not be rendered to the model conversation ?
So in the design doc we never accounted for CSV that owuld not be rendered to the model (edited). We assumed it would be the same way it is today (with a "preview" of the csv for the dust-v2-code-interpreter
app.
I think we need to account for that since we can't upload even a 2mb CSV as it will blow the token count
Ok reviving this one, that potentially needs to be updated since for Visualization it's not about the action producing files anymore. Let's quickly frame what's MVP for Visualization regarding file uploaded as content fragment.
Currently, ContentFragments are rendered in renderConversationForModel
as a user message. We limit the size of uploaded documents to avoid exceeding the model's Context Size. Example of current rendering:
{
"role": "user",
"name": "Daphné Popin",
"content": [
{
"type": "text",
"text": "<attachment type=\"text/csv\" title=\"SampleCSVFile_2kb.csv\">\n1,\"Eldon Base for stackable storage shelf, platinum\",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8\r\n2,\"1.7 Cubic Foot Compact \"\"Cube\"\" Office Refrigerators\",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58\r\n3,\"Cardinal Slant-D� Ring Binder, Heavy Gauge Vinyl\",Barry French,293,46.71,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39\r\n4,R380,Clay Rozendal,483,1198.97,195.99,3.99,Nunavut,Telephones and Communication,0.58\r\n5,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5\r\n6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37\r\n7,\"Angle-D Binders with Locking Rings, Label Holders\",Carl Jackson,613,-54.04,7.3,7.72,Nunavut,Binders and Binder Accessories,0.38\r\n8,\"SAFCO Mobile Desk Side File, Wire Frame\",Carl Jackson,613,127.70,42.76,6.22,Nunavut,Storage & Organization,\r\n9,\"SAFCO Commercial Wire Shelving, Black\",Monica Federle,643,-695.26,138.14,35,Nunavut,Storage & Organization,\r\n10,Xerox 198,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38\r\n\n</attachment>"
},
{
"type": "text",
"text": "\n@dust what's in the data? "
}
]
},
We aim to handle larger files for Visualization, moving beyond the current file size limitations.
Update renderConversationForModel
:
Increase maximum upload size limit (implement under a feature flag initially)
Develop API route and SWR logic:
Collaboration: Sync with @lasryaric (returning Wednesday) on iframe parameter passing for full content access
@spolu WDYT?
Implement front-end logic to access this content for conversation rendering
What do you mean by this? model conversation rendering or something else?
(Note: this is somewhat tied with this project https://www.notion.so/dust-tt/Attachments-DataSource-Actions-b1ee705c834f4f1e997bc6c48c08a9b2?pvs=4)
The new file API endpoints allow us to post-process files after they are uploaded + we now store content-type on the content fragments.
This likely means here that we simply want a special handling in renderConversationForModel based on content-type and size of the file.
As part of this project we likely simply want to allow uploading CSV as a first step (no postprocessing other than validating it) and maybe at a later point XLS (with post processing to CSV). As part of the renderConcversationForModel we likely want to simply change the rendering of the content fragment dynamically based on the content fragment content type (and size indeed) information.
CSV (very small) => header + embed CSV (large) => header + a few lines Anything else => nothing changes for now
?
The one thing we want to try to put a bit of effort in is the code interfaces to do this dynamic rendering to make it a bit future proof but that's not even a real concern at this stage IMHO.
Implement front-end logic to access this content for conversation rendering What do you mean by this? model conversation rendering or something else?
I mean that we need to pass the full content of the CSV file to the iframe rendering the graph. The model only requires a subset of the document to generate the React component, but when rendering the React component in the iframe, it needs access to the full data to render it properly.
Right! This task and the way we present files to the visualisation Ifrane can almost be two separate tasks 👍
Ok perfect thanks a lot 🙏🏻
I agree on the conclusion. Thanks.
Design doc here.Since the code interpreter action can potentially output some new files, we need a way to store them in the conversation.We need to distinguish this type of file to present them differently to the user as "downloadable", and not render them to the main model.The proposed solution is to add aContentFragmentType.type=action_output
.Updated task:
We want to allow uploading CSV as a first step (no postprocessing other than validating it). Then we want to update renderConversationForModel to properly render the ContentFragment given their type and size: