NaNoGenMo / 2017

National Novel Generation Month, 2017 edition.
https://nanogenmo.github.io
185 stars 7 forks source link

Intent to participate [First lines of novels] #75

Open janelleshane opened 6 years ago

janelleshane commented 6 years ago

A tiny dataset produced mixed results in my first attempt to generate the first sentence of a novel http://aiweirdness.com/post/167049313837/a-neural-network-tries-writing-the-first-sentence

Highlights:

The really big repositories I've found (Project Gutenburg, for example) are formatted inconsistently enough that they're difficult to scrape.

So now I'm crowdsourcing a larger dataset: https://docs.google.com/forms/d/e/1FAIpQLScod8P-kcLX98u6gT0rX6-20GwkDo_glz-okVVkrhr6KgQONQ/viewform. This has been posted for about 36 hours and already has 3532 submissions (not all unique). People are welcome to contribute through this form - or let me know if you have a smarter way to contribute a dataset.

At the end of the month, I'll try again with a hopefully much larger dataset, and post the results and dataset afterwards, as well as a link to whatever open-source package I end up using. It won't produce a full novel in the traditional sense, but I'll declare a moral victory if a human announces their admiration of one of the neural network's lines.

janelleshane commented 6 years ago

Marking this one complete! Big thanks to everyone who contributed to the dataset.

Writeup and highlights here: http://aiweirdness.com/post/168051907512/the-first-line-of-a-novel-by-an-improved-neural

I ended up using a syll-rnn (lstm mode) to do the generation, which ran for about 16 hours on my Macbook. Syll-rnn seems to be better at larger datasets than char-rnn, yet can handle a larger vocabulary than word-rnn. Here's the framework I used:

https://github.com/learningtitans/torch-rnn/blob/valle-syllables/doc/flags.md#preprocessing

Sequence length was 40 syllables (based roughly on the number of syllables in "It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife." LSTM size is 512, 3 layers (based on what would fit on my computer; I'm running a 1064-size LSTM now but it's taking a long time and it's not clear that the results will be any better).

140,000 words of output available here. Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

https://github.com/janelleshane/novel-first-lines-dataset/blob/master/output_checkpoint10000_temp0p6.txt

Crowdsourced dataset available here: https://github.com/janelleshane/novel-first-lines-dataset

hugovk commented 6 years ago

(We're using issues as a sort of forum, so I'll re-open this to make it easier to find.)

Good stuff!

Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

I think the eternal sand is quite appropriate for NaNoGenMo!

As a way at the ground, and the cat could have been in the town and a shock and the type on the back of the pilsage and belched and the color of the great little person who was still and the imface of the decoction of the heat between the box against the three interesting seament and the eternal sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand ...

janelleshane commented 6 years ago

Thanks for clearing that up! And for adding the completed tag!

Yes, eternal sand. People have been making Star Wars jokes at me all day.