NaNoGenMo / 2019

National Novel Generation Month, 2019 edition.
97 stars 5 forks source link

The Collected Works of Writing Prompt Prompter and Writing Prompt Responder #32

Open iistickboy opened 4 years ago

iistickboy commented 4 years ago

I will generate a novel inspired by my favorite subreddit, Writing Prompts. It will consist of a series of GPT-2-generated writing prompts and GPT-2-generated responses to those writing prompts, all grouped by weekly themes.

HackerNoon described the problem that has kept me from trying NaNoGenMo until now: “One of the open problems in the procedural generation of fiction is how to maintain reader interest at scale.” With writing prompts and responses, I can generate shorter (and possibly more interesting) sections within a larger NaNoGenMo manuscript.

OpenAI’s GPT-2 gave me access to a superpowered language model trained with a data set of 8 million webpages, all human-curated outbound links from Reddit. So much of the internet flows in and out of Reddit, it provides the scale of information needed to make a really sensible-sounding AI text generator.

Over at Reddit, the great u/disumbrationist trained an army of bots to create an entire Subreddit Simulator populated entirely by GPT-2 language models that were trained on the most popular subreddits. Following that example, I will create two large data sets.

Thanks to Max Woolf, I learned how to fine-tune two different GPT-2 language models with Google Colab. Here are the two GPT-2 models I will train and fine-tune to generate my NaNoGenMo novel:

1- A Writing Prompt Prompter trained on thousands of writing prompts posted on /R/WritingPrompts 2- A Writing Prompt Responder trained on thousands of responses to writing prompts on /R/WritingPrompts.

I think the great storytellers at Writing Prompts can provide some narrative structure to GPT-2's uncanny ability to write human-sounding text. I'll run every prompt and response through a plagiarism detector to make sure my bot doesn’t steal from its human teachers.

I developed this project while writing a magazine article about NaNoGenMo, getting advice from some great members of the community, especially Darius Kazemi and Janelle Shane.

enkiv2 commented 4 years ago

I did a similar thing with GPT2 in August ( https://medium.com/@enkiv2/interview-with-a-transformer-7fc60890f74c?source=friends_link&sk=8e69c3718cc12530f2a6ab00da5dd1e0 & https://medium.com/@enkiv2/interview-with-a-transformer-ii-4ed0a19e3b90?source=friends_link&sk=de3dbef024244c5abb4c6b91cbd4db3b ).

I found, in my experience, that GPT2 (with openai's training set) tends to extend writing prompts that are in the form of questions by listing other writing prompts, while prompts that are in the form of incomplete sentences tend to actually get completed. However, I used prompts from a couple semi-standard sets (lists of common interview questions, the proust questionnaire) & so maybe the reddit writing prompts won't be recognized as being part of a standard list & you'll actually get answers?

On Fri, Nov 1, 2019 at 1:06 PM iistickboy notifications@github.com wrote:

I will generate a novel inspired by my favorite subreddit, Writing Prompts https://www.reddit.com/r/WritingPrompts/wiki/user_guide?utm_source=reddit&utm_medium=usertext&utm_name=WritingPrompts&utm_content=t5_2s3nb. It will consist of a series of GPT-2-generated writing prompts and GPT-2-generated responses to those writing prompts, all grouped by weekly themes.

HackerNoon described the problem https://medium.com/hackernoon/fiction-generator-post-mortem-comic-book-generation-9df847dd4adathat has kept me from trying NaNoGenMo until now: “One of the open problems in the procedural generation of fiction is how to maintain reader interest at scale.” With writing prompts and responses, I can generate shorter (and possibly more interesting) sections within a larger NaNoGenMo manuscript.

OpenAI’s GPT-2 gave me access to a superpowered language model trained with a data set of 8 million webpages, all human-curated outbound links from Reddit. So much of the internet flows in and out of Reddit, it provides the scale of information needed to make a really sensible-sounding AI text generator.

Over at Reddit, the great u/disumbrationist https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/what_is_rsubsimulatorgpt2/?depth=4 trained an army of bots to create an entire Subreddit Simulator populated entirely by GPT-2 language models that were trained on the most popular subreddits. Following that example, I will create two large data sets.

Thanks to Max Woolf https://minimaxir.com/2019/09/howto-gpt2/, I learned how to fine-tune two different GPT-2 language models with Google Colab. Here are the two GPT-2 models I will train and fine-tune to generate my NaNoGenMo novel:

1- A Writing Prompt Prompter trained on thousands of writing prompts posted on /R/WritingPrompts 2- A Writing Prompt Responder trained on thousands of responses to writing prompts on /R/WritingPrompts.

I think the great storytellers at Writing Prompts can provide some narrative structure to GPT-2's uncanny ability to write human-sounding text. I'll run every prompt and response through a plagiarism detector to make sure my bot doesn’t steal from its human teachers.

I developed this project while writing a magazine article https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/81578-will-robots-make-the-next-big-bestsellers.html about NaNoGenMo, getting advice from some great members of the community, especially Darius Kazemi https://twitter.com/tinysubversions and Janelle Shane https://twitter.com/JanelleCShane.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2019/issues/32?email_source=notifications&email_token=AADXUGONSFZJBBAKMJSSUS3QRRO3FA5CNFSM4JH5PQH2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWFE2VA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXUGLQLJSVWU4R7QQZSVLQRRO3FANCNFSM4JH5PQHQ .

iistickboy commented 4 years ago

Thanks for the note, I'll read your whole Medium article. I'm going to generate a prompt from Writing Prompt Prompter, then deliver the computer-generated prompt to the Writing Prompt Responder, so I hope I get some connection between prompt and response.

I also built startoftext and endoftext into the Writing Prompt Responder dataset, so I'm hoping that will organize the response a bit.

I'll keep you posted!

iistickboy commented 4 years ago

After 17 days of text generation, my two finetuned GPT-2 language models have generated a 51,422-word manuscript filled with computer-generated writing prompts and computer-generated responses to those prompts.

Read the whole SUBLIME TEXT manuscript at this link

Code for Writing Prompt Prompter (Jane Doe)

Code for Writing Prompt Responder (Mr. Output)

ReadMe link with all the steps I took for the project

The writing prompts were written by a finetuned GPT-2 language model I named “Jane Doe” after a character in one of the generated texts. The writing prompt responses were written by a fine-tuned GPT-2 language model I named “Mr. Output” after a character in another one of the generated texts.

I loved every minute of this coding and reading experience. If you enjoy these stories, email me your address at jasonboog [at] gmail [dot] com and I’ll snail mail you a handwritten copy of one of these stories in zine format.