dust-tt / dust

Amplify your team's potential with customizable and secure AI assistants.
https://dust.tt
MIT License
947 stars 107 forks source link

Ideation : R&D Burst #2572

Open philipperolet opened 10 months ago

philipperolet commented 10 months ago

:scientist: R&D burst

Hey team ! Dust will soon make R&D burst, a 1-week exploration (that i'd do, with of course assistance welcome)on a given topic (e.g. "How to improve model factuality" or "what's the best chunking strategy").

Why

:information_source: Forging informed convictions about key technical elements of our business e.g. for model factuality, there is a d RAG vs using finetuning for memory => We currently favor RAG; we want to be able to support this with reasearch, so when a competitor comes doing FT, we can back our claim (and if it turns out FT can do good things, it's quite important that we know about it too)

:european_castle: Opportunity for a moat If the burst outcome is conclusive--e.g. we find out that we can greatly improve factuality using one key research idea + smart engineering--we make a bet on 3 months of R&D. Product gets a golden boost, strong differentiator VS potential competitors

:speaker: Research marketing Just like speculative sampling when I joined, this allow us to raise awareness about Dust in one of the best possible ways (be in people's minds as "the experts"). In general, and relatedly to the point about informed convictions, we want to show the world we are on top of topics that are key to our business (on top of = expert + key opinion leader)

Topic ideas

Brain dump, to be collectively completed:

At the time we have intuitions for most of those => we can turn them in experiment-backed convictions and ideally turn those convictions into features.

Input welcome

We will frame the burst more rigorously soon (topic decision issue, framing issue, etc.)--for now, ideation phase, gathering general ideas and feedback

Thanks :bow:

spolu commented 10 months ago

Comments

Factuality: RAG vs memory via Finetuning (or use both)?

What do you mean by memory? Finetuning requires a dataset. What would it be. Certainly more than a week work?

Factuality: survey techniques for detecting hallucinations

Looks a bit shallow. It's just an added step, but there's something interesting about model calibration that could be interesting (see below)

Chunking: optimal length

Not sure I see what would be the experiment here. It relies on existence of a benchmark right? Would you use a public benchmark for that (what are they?)

Chunking: optimal strategy (overlap? link handling? )

Same remark here?

Reasoning: using CoT or ToT in assistants

If it's in the context of assistants then it also rely on the existence of a benchmark that we trust no?

Reasoning: how to properly chain assistants

I think this is more product than "research"

Evaluation: how to assess answer quality

Looks like a requirement for many other ideas? Probably worth framing this a bit more because there is a lot of questions around that?

Ideas

The two above kind of work together in a sense as any benchmark we build will likely rely on the calibration of models...

philipperolet commented 10 months ago

Thanks a lot for the input. The topics I dropped were "pistes", admittedly not framed yet, but will give more detail about the ones I think worth when I get back to it. On your ideas, much more framed :) they would of course be directly good fit, the internal benchmark would certainly bring direct value

That said. Pausing on this topic for a bit, focusing on delivering Ahuna for now. Will get back to it soon