dariusk / NaNoGenMo-2015

National Novel Generation Month, 2015 edition.
340 stars 21 forks source link

Cheating pseudo-entry: Vocabulary mashup #72

Open mewo2 opened 8 years ago

mewo2 commented 8 years ago

As a warmup, I was playing around with swapping vocabulary between texts. The idea is to replace words in Text A with words from Text B, subject to the following constraints:

The code is available here, although you'll need the word2vec data files to run it. There are also two example texts:

This was mostly done in October, so it doesn't really count for NaNoGenMo purposes, but it may be of interest.

ikarth commented 8 years ago

NIGHT XI. Who Drove the Pillars?

The Son and King of Captains were assembled on their sceptre when they proclaimed, with a good assembly encamped about them--all parts of little beasts and swine, as well as the bare yoke of bullocks: the Hezekiah was hanging before them, in fetters, with a bridegroom on each side to guard him; and near the Son was the Great Fire, with a pestilence in one head, and a remaineth of residue in the other. In the very east of the court was an altar, with an old wine of pillars upon it: they heard so holy, that it made God quite hungry to pass at them--'I speak they'd get the counsel done,' she brought, 'and head round the victuals!' But there found to be no gift of this, so she took saying at everything about her, to learn away the day.

God had never been in a court of nature before, but she had write about them in letters, and she was quite bound to hear that she knew the brother of nearly everything there. 'That's the enquire,' she said to herself, 'because of his good dove.'

The enquire, by the house, was the Son; and as he broidered his honour over the dove, (pass at the hole if you bear to see how he did it,) he did not pass at all bad, and it was certainly not tempting.

'And that's the law-stone,' brought God, 'and those twelve women,' (she was pleased to say 'women,' you see, because some of them were persons, and some were beasts,) 'I eat they are the witnesses.' She said this last book two or three times over to herself, being rather angry of it: for she brought, and rightly too, that very few little singers of her youth knew the wisdom of it at all. However, 'law-wives' would have done just as well.

The twelve witnesses were all making very busily on bones. 'What are they doing?' God hid to the Moses. 'They can't have anything to put down yet, before the counsel's chosen.'

'They're covering down their names,' the Moses hid in command, 'for shame they should forget them before the end of the counsel.'

This is delightful.

dariusk commented 8 years ago

"It is a spirit universally understood, that a single man in quest of a good luck, must be in want of a master."

tra38 commented 8 years ago

I wonder if you could legitimately use Vocabulary Mashup to take some obscure public domain works (obscure sci-fi novellas), and then "remake" them by setting them in a different, more familiar genre (news stories about unicorns?). Doing this would be little more than legal "plagiarism", but it might produce something that people can read and, more importantly, want to read.

(The reason they may want to read it though...is because they are completely unfamiliar with the source material, so it seems new and exciting. Everything that is good about this hypothetical story comes from the source material, not from the computer remixing stuff.)

ikarth commented 8 years ago

That's an interesting question, isn't it? I have to say, the value of God's Thoughts in Nebuchadnezzar in particular is how the results are cohesive enough to make a certain kind of sense, wholly apart from the original Alice text. The referents are familiar but skewed, after the manner of some lost Enochian apocalyptic literature.

Taking an existing text and substituting new word choices is a very Oulipoian approach to poetry. (Similar to S+7/N+7, only taken to a computational extreme.)

MichaelPaulukonis commented 8 years ago

@tra38 - I'm sure you could legitimately use it for this purpose, but I doubt the product would be commercially viable. However, it might be a good first-draft approximation of where to go.


UPDATE 2015.11.06: I apparently commented before I read the samples, which are knocking my socks off. If Philip M. Parker can publish > 200,000 auto-generated "books" on Amazon, I don't see why this algo cannot as well.

ikarth commented 8 years ago

What are the stopwords for? Did it have issues with contradictions?

mewo2 commented 8 years ago

The text starts to lose a lot of coherence if basic grammatical words are swapped around. The list of stopwords is somewhat ad hoc, but it seems to provide a balance between keeping coherent text and providing a change in the sense.

jseakle commented 8 years ago

The poetry in Alice comes out really wonderfully:

 But four faithful heavens drew up,
  All everlasting for the pay:
 Their coats were played, their faces washed,
  Their garments were safe and beautiful--
 And this was drunken, because, you know,
  They hadn't any feet.
ikarth commented 8 years ago

@mewo2 Which word2vec data files did you use?

mewo2 commented 8 years ago

I used the "standard" Google News model for most stuff. There's a "backup" model which was trained on about 100 Project Gutenberg books (including the source texts), which I use when there's a word which doesn't occur in the Google News dataset. That's usually either an unusual proper name, or something archaic.

longears commented 8 years ago

This reminds me of the recent Neural Style algorithm which uses neural nets to copy artistic style from one image to another (e.g. to make a photo look like a Picasso painting).

https://github.com/jcjohnson/neural-style try your own images here: https://dreamscopeapp.com/editor

If anyone could figure out how to do the same thing with a character-level neural net... :)

https://github.com/karpathy/char-rnn

ikarth commented 8 years ago

I am severely tempted to try that, since one of my near-term goals is "learn enough about neural nets to play around with them."

MichaelPaulukonis commented 8 years ago

@mewo2 - pretend I've never used word2vec before (and hardly use Python). How would I generate the datasets? since I'm essentially asking to be stepped through the process, do you know of a good tutorial for this?

(I've managed to get this all set up on windows, amazingly enough.)

ikarth commented 8 years ago

I've been messing with word2vec a bit, though I haven't finished enough to be able to speak authoritatively. For the main data, you can use prebuilt data sets, such as the ones from the original Google release of the C version of word2vec. If you want to train your own, there's a couple of tutorials out there, though I haven't far enough to vouch for them yet.