NaNoGenMo / 2017

National Novel Generation Month, 2017 edition.
https://nanogenmo.github.io
185 stars 7 forks source link

Frequency transforms of text #132

Open danuep opened 6 years ago

danuep commented 6 years ago

I didn't even get the idea until a couple of days ago, and mostly I'm hoping I can get this uploaded before midnight...

Reading @aparrish at #23 talk about hoping to get a meaningful average novel got me thinking about the scales of variation in play, which led to wavelet transforms, which led to

Haar of Darkness

which is unfortunately 2000 words short of the limit, so in honor of a brilliant woman of letters and a brilliant woman of numbers:

The Wavelets, a Daubechies transform of The Waves, by Virginia Woolf

[edit: now with correct link to The Wavelets]

danuep commented 6 years ago

(now that I've slept)

I'm grateful to @aparrish for sharing her word vectors generated from Project Gutenberg. I wouldn't have had the time to pull this together without that resource. If I had more time, I'd go back and be more content-aware about tokenizing the source texts -- I split on spaces and at each non-letter character, and the vector file contains entries for tokens like '--' and contractions. Entertainingly enough, The Waves isn't in Project Gutenberg, and so my lookup error log was a nice list of words that she coined in that book. For those, I greedily matched valid sub-words starting from the beginning of the word.

I used JWave for the Haar and Daubechies transforms, and Annoy for the nearest-neighbor matching.

hugovk commented 6 years ago

🎈

Is the source available somewhere?

danuep commented 6 years ago

I'll put it up later today--was mostly rushing to meet the deadline (which I now see was UTC, not local, so oh well).

danuep commented 6 years ago

Scripts are up at https://github.com/danuep/nanogenmo2017