dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.85k stars 508 forks source link

feature: GPT-3 integration #808

Closed skybrian closed 3 years ago

skybrian commented 4 years ago

This is just a fun idea but it might drive interest in Dolt among computer programmers. (I'm not sure if an issue is the best way to get in touch; apologies if it's the wrong forum.)

There are a lot of people interested in trying out GPT-3 and currently the best way to do that, if you don't have API access, is AI Dungeon. However, since it's designed for game-playing, AI Dungeon has a lot of drawbacks from the standpoint of doing experiments and sharing the results. It makes it very easy to retry requests until you get a good result (cherry-picking) and you can edit the transcript at any time. The transcript also doesn't make it clear what was human input and what was AI output, and we also don't know what AI Dungeon is actually submitting to the GPT-3 API.

Suppose there were an alternate website for users to manually try out GPT-3 that saved all your input and output in Dolt? It need not be all that elaborate to be fun to play with, and it would be a good way to experiment with ways of collecting data and sharing it, and make a fun demo for Dolt.

timsehn commented 4 years ago

Thanks for reaching out. This is as good a forum as any. We have also noticed the deep interest in GPT-3.

I like the idea of input output pairs in Dolt. They're not really versioned unless you make the primary key the input and whenever GPT-3 gives a different response, a diff is created. This would kind of catch your cherrypicking phenomenon. But so would an append only DB. I'm trying to figure out how to best show off the features of Dolt beyond just a large set of input/output pairs gathered from GPT-3. But maybe that's enough...

Let me talk with the folks here and see if this is a space we can chase. We have some tangential connections at Open AI and I think they are mission aligned with what we're trying to do. I'm sure we could get an API key.

skybrian commented 4 years ago

Okay, if they're interested, I have a few ideas about how to represent the data. But I am not a Dolt user yet (I just read your blog) so perhaps I misunderstood something.

Since the output of GPT-3 is nondeterministic (depending on the randomness setting), the user might want to run the same query multiple times to get an idea of what the response distribution is. It would be nice to save the responses as separate records (rather than just in history) and perhaps increment a counter each time the response occurs. The counter history would then record each time the response appeared. (A special counter datatype would be helpful here because a merge should add the counter changes from each branch. This would allow different researchers to gather data and combine datasets without having to do manual merges of counter updates.)

Also, often GPT-3 is used for conversation. This can be thought of as a game where the human and computer take alternating turns. I'm guessing this results in multiple requests sent to the API where earlier requests are prefixes of later requests. It might be good to represent that relationship? The result is conceptually a tree, similar to the opening moves tree of a two-player game like chess or go.

timsehn commented 4 years ago

We like the idea enough to try and chase a license :-)

timsehn commented 4 years ago

image

I think we can start with an AI Dungeon scraper.

timsehn commented 4 years ago

You can follow along here: https://www.dolthub.com/repositories/Liquidata/ai-dungeon

skybrian commented 4 years ago

Someone else wrote an interesting UI: https://twitter.com/minimaxir/status/1288125135526875146

timsehn commented 4 years ago

Awesome. Will check it out.

Was making steady (though CEO part-time) progress on an AI Dungeon CLI that would preserve output to Dolt.

--Tim

On Wed, Jul 29, 2020 at 11:14 AM Brian Slesinsky notifications@github.com wrote:

Someone else wrote an interesting UI: https://twitter.com/minimaxir/status/1288125135526875146

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/liquidata-inc/dolt/issues/808#issuecomment-665820406, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJAR3D7TB4IEDLN5GUEB3LR6BRJFANCNFSM4PE64BWA .

skybrian commented 4 years ago

It seems that AI Dungeon probably won't work well for this:

https://mobile.twitter.com/nickwalton00/status/1289946861478936577

timsehn commented 4 years ago

I read that. Everything after the first prompt is GPT-3.

https://www.dolthub.com/repositories/Liquidata/ai-dungeon https://github.com/liquidata-inc/ai-dungeon-scraper

Blog post coming later today.

timsehn commented 4 years ago

https://www.dolthub.com/blog/2020-08-12-ai-dungeon-scraper/

zachmu commented 3 years ago

I think we've pushed this about as far as we're going to for the time being. Please feel free to reopen this, or file a new issue, if there's more you want to see.