VincentToups commented 10 years ago

My project, Swann's Way Through The Night Land, generates a novel by using Word2vec to construct vectors for all sentences in two public domain novels (The Nightland, by William Hope Hodgson and Swann's Way, by Marcel Proust) and then replacing all sentences in the first with their closest matches from the second.

VincentToups commented 10 years ago

Read the generated novel here

cpressey commented 10 years ago

Interesting. I was thinking of trying something similar using py-editdist.

I tried reading The Night Land a year ago, but only managed to get halfway through, as you will perceive. (Haven't read Swann's Way though.)

It's definitely weird trying to "see through" to the original story while trying to read this!

VincentToups commented 10 years ago

I've been thinking of a variety of ways to build edit distance metrics on top of sentences. It turns out these things are pretty similar to my dissertation work, which was on, in part, embedding neural spike trains in appropriate metric spaces for automatic clustering.

At present I produce the vector for a particular sentence by summing up the vectors for individual words, but this doesn't really capture the fact that sentences are often as much about how one gets to the meaning as they are about the meaning itself. An edit distance metric on the word2vec vectors would preserve information about both the meaning of the sentence and the path dependence. Not sure that it would improve the results here dramatically, though.

I am super busy this month, sadly, so I probably won't have time to really fiddle with this stuff.

VincentToups commented 10 years ago

Oh yeah, cpressey, try reading "The Night Land, A Story Retold" which is by James Stoddard. It is basically a rewrite of the original book to render it slightly more readable.

I also enthusiastically recommend the stories available here, some of which are absolutely great examples of science fiction, and all are set in The Night Land's setting.

Proust doesn't really need any recommendation, of course.

cpressey commented 10 years ago

@VincentToups Excellent, thanks for the links.

enkiv2 commented 10 years ago

This one is really quite neat. It reads like a fairly oblique human-written book. Are you doing edit distance based on the english translation of A Cote Chez Swann, or the original french?

On Wed Nov 12 2014 at 8:25:43 AM Chris Pressey notifications@github.com wrote:

@VincentToups https://github.com/VincentToups Excellent, thanks for the links.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/91#issuecomment-62716916 .

MichaelPaulukonis commented 10 years ago

`emacs lisp` ?!!?!?!

(setq impressed t)

VincentToups commented 10 years ago

Word2vec did all the heavy lifting, but I still resent (good naturedly) the implication that Emacs Lisp isn't a real enough programming language. Emacs has built in support for efficient access and editing of large files and passable token users for words and sentences.

I started the project in Clojure, in fact, but changed when I realized how much easier it would be in Emacs Lisp.

VincentToups commented 10 years ago

And @enkiv2 I'm just using the English version of Swann's Way on project gutenberg. Check out the resources directory in the repo. The text is in there.

christiaanw commented 10 years ago

I couldn't directly see which novel you were retelling through the sentences of the other, so I Ctrl-F-ed for madeleine.

Could also have done that for Combray, or Swann, of course.

And it really gets me, because I'm trying to make sense of the gap in thought between these long sentences.

In a sense it reminds me of Say Anything, they're also using sentence similarity metrics, but they're using it to get the next sentence in for a partly user-generated story.

VincentToups commented 10 years ago

@christiaanw I think this approach definitely leaves things to be desired. I might try using the King James Bible, which is closer stylistically to The Night Land, and might produce more interesting results.

Novels are so complexly interdependent things, its hard for any generational approach to capture that level of correlation while simultaneously evidencing a superficial "arc" of story and character development. I would guess that simulation would be a much more fruitful approach to novel generation. Also more consistent with the idea of the novel as a "fictive dream," in which as much is revealed as is hidden.

I expect that a very simple set of simulation rules would be able to generate some basic interesting stories, but stretching that out the length of a novel would probably stress the complexity of any reasonably sized simulation.

cpressey commented 10 years ago

@VincentToups Encouraged by your results, I decided to go ahead with a similar replacement approach, except at the word level instead of the sentence level, and using the (much simpler) Levenshtein edit distance metric. Just to see what it would be like.

Replacing the words of "The Masque of the Red Death" with words from "Don Quixote" resulted in -- what else? -- "The Basque of the Red Death".

VincentToups commented 10 years ago

Woah! This is kind of amazing!

MichaelPaulukonis commented 10 years ago

I still resent (good naturedly) the implication that Emacs Lisp isn't a real enough programming language.

No implication was intended. I do all of my non-.NET work in Emacs.

While I've played with Emacs Lisp over the years, I've never gotten to the point where I could write anything serious with it. Much to my regret.

Now, Javascript - that's something I can handle. AND I can use this knowledge at work. (:::sigh::: I've never worked anywhere that has another Emacs user.)

I'm curious to see if the new Guile Emacs will change the playing field for Emacs or Guile.

And then there is Elnode, which can even be run on Heroku. I've thought about setting up a markov-page with disassociated-press as the back-end, since that's such a .... weird implementation. Once I saw that Jamie Zawinski referred to the source-code as obscure and impossible to understand [paraphrase, I can't find back the source], I knew I never had a chance of understanding it myself.

VincentToups commented 10 years ago

Emacs Lisp is a Lisp, which means that for projects where you want something comfortable and easy to use, its great. I'd be happy if Guile Emacs improved the performance of my Emacs Lisp code but I don't really care to use Scheme in Emacs - Emacs Lisp is good enough and I have an enormous amount of code written in it already.

Re NaNoGenMo, I regenreated the novel with a vector set generated from the two books, which seems to have improved, slightly, the results. I also generated a comical output with a "source novel" containing just a few negative and a few positive sentences:

I

MIRDATH THE BEAUTIFUL

I felt good.

It was horrible.

It was great. Then it was good. She felt good.

He felt good.

Then it was horrible.

Then it was great.

It was good.

I felt bad.

Then it was bad.

I felt good.

It was horrible.

It was great.

Then it was good.

She felt good.

He felt good.

Sadly, it is still pretty much random. I have thought a lot about this but I don't think there is an easy solution.

dariusk / NaNoGenMo-2014

Swann's Way Through The Night Land #91

`emacs lisp` ?!!?!?!

dariusk / NaNoGenMo-2014

Swann's Way Through The Night Land #91

emacs lisp ?!!?!?!

`emacs lisp` ?!!?!?!