LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.94k stars 3.22k forks source link

Convert writing prompts dataset to instruction dialog #646

Closed huu4ontocord closed 1 year ago

huu4ontocord commented 1 year ago

Use the prompts/story dataset from here: https://www.kaggle.com/datasets/ratthachat/writing-prompts. In addition to the prompts and story, augment with instructions such as “write a story about {prompt}, ending with the sentence {last_sentence}”. “write a story about {prompt}, where the beginning of the story is about {summary of the beginning part}”. “write a story about {prompt}, where the middle of the story is about {summary of the middle part}”. “write a story about {prompt}, where the end of the story is about {summary of the end part}”

fabraz commented 1 year ago

Here are some samples from writing prompts:

id prompt story
1 [ WP ] When you die , you do n't go to the afterlife of you 're religion , you go to the afterlife of the religion whose tenets you followed most closely , knowingly or not . Thomas loves science fiction , and is pleased to find himself sitting by the park entrance with Arthur C. Clarke ’ s “ Fountains of Paradise ” open in his lap . He must have jogged there , he thinks to himself as he admires his brand new black-and-white Nikes . He stretches out in his black joggers and turns the page . “ But there was no substitute for reality , one should beware of imitations ” , he reads before shutting the book . Thomas ponders what he has read as he looks to the right ; not a single car can be seen . The street appears infinite in length and the buildings fade in to the distance with it . He stands and begins his first step down the street . His movement halts when he hears a young voice behind him , “ You look thirsty mister . Would you like some lemonade ? ” Thomas walks back past the park entrance and over to the lemonade stand , wondering how he had not noticed it before . It is beautiful , the entrance ; but the park is closed now . Thomas stares up at the gates in awe . Thomas is interrupted again by the child , “ $ 5.50 , please. ” Thomas looks at the counter , flustered . “ I ’ ll have the punch instead. ” As the child pours the purple drink in to the cup , Thomas reaches in his pocket finding a five dollar bill and three quarters . “ Keep the change ” , Thomas says as he picks up his drink . Thomas sips and the sky slowly dims . He feels his breath drawn away from him as a comet sails over the park entrance . And Heaven ’ s Gate opens .
2 [ CW ] [ PM ] Write your hero into a corner , and let me get them out . Bob dropped five of the Zeds , reloaded his Colt 45 , and ran up the stairs . He had someone currently upstairs , alerting Search and Rescue to find a place to land in this urban , industrial nightmare . They were currently in a truck depot , the places where goods would be transferred truck from truck . Already , some men defending the front door had been pulled in , causing the rest to fall back . The first , and only , line of physical defense , the hardened steel gates , created to stop robbers , were badly banged up , from the onslaught of fists against it . It was bad enough that the zombies managed to cram two at once inside the doorway , but losing the gates would mean that the horde would rush in . Hey ! '' Courtney rushed outside the communications office , her .22 rifle in hand . They 're at the trainstation , just a block from here ! '' It 's probably too late , mate . '' Bob said back , Just look at 'em ! '' The metal steps leading to the elevated walkway was a savior , only allowing one body to get in at a time . Unfortunately , our heroes had just fought their way here , from a few streets down . Seems easy ? Not when you have to take detours through heavily infested buildings because of blockades in the roads , or just the sheer number of walkers wouldn't 've allowed you to run through them . Bob 's equipped with a Colt 1911 .45 caliber pistol , excellent at punching through heads , but at the cost of heavy kickback . Also due to it 's temptingness , Bob has used all but three 7-round magazines . He has a knife , but who the hell would be able to take anyone out with that ? Courtney has her 10/22 Ruger Takedown . Initially intended for long range hunting , the rifle particularly excels at going through targets cleanly . The only disadvantage is the lack of stopping power . They have a fully gassed up FedEx truck at their disposal . A few men inside , surrounded , but armed , are ready to go when you tell them where they need to go . Around 31 zombies have gotten in already , with god knows how much outside .
3 [ cw ] write about the strangest/scariest/saddest dream you 've ever had in less than 200 words . The night was as thick and terrifying as any I had ever seen before . All I could hear was the scream of the wind past my ears , the pounding of hooves , huffed horse breaths , and the pounding of my own heart . The woods were closeknit , and my path was barely visible , hidden under a thick layer of bracken . `` Faster , '' I whispered as I dug my heels in . Safety was close and yet so far away , calling to me . He would save me ; I knew it with all my heart . All I had to do was outrun the demons at my back first .

Just in case anyone wants the prompt tag description.

@ontocord , can you improve the issue details having the samples above, please?

huu4ontocord commented 1 year ago

Interesting how the [XX] tags are used. I wasn't thinking about those.

I was thinking of Instructions -> answers like "User: write me a story about {stripped_prompt} -> Rosey: Sure, here's a story about {stripped_prompt}: {story}" where stripped_prompt removes things like "write about" "in less than 200 words", etc.

And the inverse "User: What is this story about {story} -> Rosey: I think it's about {striped_prompt}"

You could also do summarization of longer stories into 4 or 5 pointed sentences and ask for an outline. Or you could give an outline and ask Rosey to fill in the story.

For the prompt tag, you could add constraings to the prompts based on the tag. So for [RF], you could add to the end of the actual instruciton: this story could {have happened before or should be able to happen in the real world to unknown people. Not what you think could happen in the future.}

Lmk know if you need more input.

huu4ontocord commented 1 year ago

Also these instructions: “write a story about {prompt}, ending with the sentence {last_sentence}”. “write a story about {prompt}, where the beginning of the story is about {summary of the beginning part}”. “write a story about {prompt}, where the middle of the story is about {summary of the middle part}”. “write a story about {prompt}, where the end of the story is about {summary of the end part}”

fabraz commented 1 year ago

@ontocord, check this colab notebook out, to see whether I am in the right direction.

You might be interested in the last cell.

huu4ontocord commented 1 year ago

Cool - and thank you for putting the [NSFW] flag there. that will help us when we turn on the filter. Also, I think some of the User prompts might be too long - maybe it was adding the story into the user prompt? Also some of the User prompts says "None". You are on the right track! YEAH!

fabraz commented 1 year ago

@ontocord, I'm getting terrible results with story summarization. I've tried facebook/bart-large-cnn and llenai/led-large-16384.

When you mentioned summarization, were you thinking about any particular model?

About the NSFW tag, fortunately, It was already on the raw data.

huu4ontocord commented 1 year ago

as base case, you could do sliding window of N sentences and run through t5-small or base? also, check out https://colab.research.google.com/drive/1nZx5LRjO61fYprFyqtrwPDLOis6ctR4p?authuser=1 and https://colab.research.google.com/gist/pszemraj/c7dce704516ee33de107e23a6613d613/textsum-summarize-text-files-example.ipynb also cc me on discord and i can tell you more resources for summarization

fabraz commented 1 year ago

Thanks for the advice. Check out the updates at colab notebook.

huu4ontocord commented 1 year ago

lmk if you pushed a PR?

andreaskoepf commented 1 year ago

Closing old data issue.