dariusk / NaNoGenMo-2015

National Novel Generation Month, 2015 edition.
341 stars 21 forks source link

"Where I'm From" poem & novel generator #49

Open marythought opened 8 years ago

marythought commented 8 years ago

Some considerations: I'll be coding in Ruby JavaScript I'd like to try using the Goodreads API / Google Books API (or something similar) in some way Use text from Gutenberg or scrape from internet? I've yet to try scraping so that could be interesting (plus Gutenberg has a pretty strict anti-robot policy so texts would need to be downloaded) My husband's idea: find an appropriate sci-fi novel and replace all instances of "snake people" with "millennials" (I am not making this, but somebody should)

dariusk commented 8 years ago

he's such a card

hugovk commented 8 years ago

You can download CDs and DVDs of Project Gutenberg books here: https://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project

I didn't know there is a Google Books API, I'll have to check it.

marythought commented 8 years ago

DAY ONE

In my teaching years, this poem was everywhere:

Where I'm From (George Ella Lyon)

I am from clothespins, from Clorox and carbon-tetrachloride. I am from the dirt under the back porch. (Black, glistening, it tasted like beets.) I am from the forsythia bush the Dutch elm whose long-gone limbs I remember as if they were my own.

I'm from fudge and eyeglasses, from Imogene and Alafair. I'm from the know-it-alls and the pass-it-ons, from Perk up! and Pipe down! I'm from He restoreth my soul with a cottonball lamb and ten verses I can say myself.

I'm from Artemus and Billie's Branch, fried corn and strong coffee. From the finger my grandfather lost to the auger, the eye my father shut to keep his sight.

Under my bed was a dress box spilling old pictures, a sift of lost faces to drift beneath my dreams. I am from those moments-- snapped before I budded -- leaf-fall from the family tree.

For my first trick, I'll be working on a poem generator (I know I know, we're building a novel, stay tuned ok) to identify the parts of speech at work here and generate new "I'm From" poems that mimic parts of speech and important sound patterns. This should be good practice in working with natural language processors in order to generate poem-length memoir-esque bits of text -- which I can then use as the base for further novel expansions.

marythought commented 8 years ago

Not a bad start! I got RiTa loaded and working, so that's a huge step in the right direction. Next I think I need to find some word banks / corpora for specific parts of the poem (example: nature words). Rita's proper nouns are kind of cringe-y but I'll run it more times and see if I need to substitute something else there. FYI for anyone getting started with Rita, here's a list of the parts of speech abbreviations:

screen shot 2015-11-01 at 6 52 34 pm
dariusk commented 8 years ago

Yes! One of my to-do items is to make a pull request to make the PoS list more prominent in the RiTa documentation...

marythought commented 8 years ago

DAY TWO

I spent a few hours this evening working on linking up random choices from custom word lists. I forked Darius's corpora repo linked in the NaNoGenMo resources and also found some good word lists on the internet for what I am looking for. Fun fact: as a middle school English teacher, I loved word lists, or "word pools" we would sometimes call them. The walls of my classroom were plastered with posters of color words, verbs, adjectives, sensory words, etc. (until mandatory testing took over the entire Spring and they had to be covered up).

Sticking with the corpora format, the word lists are in JSON. JavaScript isn't my first programming language, so I had to google "how do I link a local JSON file to my javascript" and Y'ALL this should be a lot easier, doncha think? I did not want to involve html files or ajax requests (eeek) or jQuery (no!), at least not yet, so I cheated by just making my word list files .js files and then requiring them.

Like this: Fear the Repo

Shush, You, I'll DRY it up later.

I'm pretty happy with how it's shaping up, I love using RiTA to be able to control syllable length.

screen shot 2015-11-02 at 6 21 49 pm

As a reminder, the source poem is here.

I'm hoping to finish assembling the poem tomorrow, then I can figure out where I want to take it from there.

marythought commented 8 years ago

DAY THIRD -- oh it is very late make that DAY FORTH

Just checking in with some sample output. I wasn't happy with the trees and bushes lists available to me, so I'm just inventing some instead. :D Done through second stanza, two to go!

Names list is 1,000 randomly generated names from list of random names -- if anyone wants to be added, I'm happy to add you! (ps repo is here)

I am from nightclubs, from Mart and fundamentalism. I am from the aisle under the common room. (Navy blue, feminine, it smelled like cranberry.) I am from the tulip spruce the yellow corkbark birch whose diverse caps I remember as if they were my own.

I'm from parsnip and statistics, from Wendie and Marcelle, I'm from the slam poets and the smart-alecks, from 'well done' and 'what'! I'm from 'He was born with a gift of laughter' with a corrosive porcupine and four I can say myself.

marythought commented 8 years ago

DAY FOUR (FOR REAL)

We have a completed poem!

Where I'm From

I am from birthdays, from Big Mac and collectivization. I am from the trot under the storm cellar. (Cerise, thirdquarter, it tasted like jackfruit.) I am from the stinking cottonwood the tan lilac yew whose emerald hares I remember as if they were my own.

I'm from celery and byproducts, from Fransisca and Scottie. I'm from the slam poets and the mean girls, from 'hallelujah' and 'just kidding'! I'm from 'Call me Ishmael' 'It does not matter how slowly you go so long as you do not stop' and four pamphlets I can say myself.

I'm from South Gate and Beaverton, shredded banana squash and cooling smoothie. From the neck my stepsister sewed in a football game, the thumb my mum trailed to keep their smell.

Above my tea cart was a aft box holding soft frictions, a sift of lost faces to drift around my dreams. I am from those moments-- brooded before I dabbled-- leaf-fall from the family tree.

For the next step I can go one of (at least) two ways:

MichaelPaulukonis commented 8 years ago

I like this.

marythought commented 8 years ago

DAY ... TEN?

Ok, after taking some time off to learn all the data structures and algorithms (or not learn, as the case may be), I needed a quick win so I came back to this and was able to publish a version of the poem generator!

Where I'm From

It's not very fancy, and probably breaks all the Node/Express rules (I am a very proficient Ruby on Rails developer seriously you should hire me), but it meets the prime objective of generating a new poem on demand.

I like this so much I am not sure how to translate it into a novel... but let's not call it "done" yet, because I'm going to sleep on that.

I found a couple open-source texts that work well for "memoir" style (Anne of Green Gables is the frontrunner), so I played with using RiTA to markov it up. My idea was to start with the base text, and then see if there's any way to prioritize the keywords generated in the 'Where I'm From' poem (so it would be a poem followed by short vignette featuring terms mentioned in that poem, and then more in that pattern).

screen shot 2015-11-10 at 8 01 09 pm

It's interesting, but it isn't very readable in paragraph form. So I think I need to consider another method for text generation. Which puts me back at the starting line. :)

Maybe I'll just write more poems...? #NaPoGenMo! I'm not 100% invested in the novel form, at least not for my first experiment this year, but I'm shooting to adhere to the 50,000 word count...

cpressey commented 8 years ago

+1 NaAnOfGreGaGenMo!

Ah, if only I wasn't already overcommitted...

(There really is a NaPoGenMo too btw, but it's held in April.)

marythought commented 8 years ago

DAY ELEVENTHEN

Some quick text to share, I'm playing with the RiTA RiLexicon to find near replacement words for a classic poem (again with the poems!!! she just won't stop...). My goal here is to generate output that is clearly recognizable, but sounds bananas.

You might be curious, what is the difference between Rita's RiLexicon methods similarBySound(), similarByLetter(), similarBySoundAndLetter(), and rhymes()? So glad you asked... let's take a look at each of these at play! Each method returns an array of matches, so the computer is choosing a random match (or the original word) each time.

Similar by Sound

Compares the phonemes of the input word (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.

Two reeds divert in a yell good, And soggy I could not trammel berth And be one travels, pong I staid And cooked dean one as far as I curd To where it burnt in the undergrowth;

Similar by Letter

Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.

Two loads diverged in a fellow wood, And sorry I could not travel booth And be one traveler, song I stood And cooked dawn one as far as I mould To where it bet in the undergrowth;

Similar by Sound and Letter

First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter();

Two rods diverge in a bellow good, And sorry I could not travel bath And be one traveled, pong I stood And cooked doan one as far as I could To where it vent in the underwrote;

Rhyme

Two words rhyme are considered as rhyming if their final stressed vowel and all following phonemes are identical

Two episodes diverged in a mellow likelihood, And safari I could not travel both And be one both, strong I sainthood And overlooked clown one as far as I withstood To where it dissent in the undergrowth;

Verdict

I hadn't tried by letter before this little exercise (thinking the sound would be more important) but I actually like that output the best, here. It does seem to be keeping the sound and rhythm of the word as well. Linguistical coincidence? Edit-distance magick?

Rhyme is clearly variating greatest from the source text -- this could be fun to play with for replacing end words (or generating new rhyme words) but I won't use it in this "replace nearly every word" exercise.

Just for fun: Alliteration

Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon

Two razor diverged in a abuse wings, And scratch I could not fanatic both And be one entitled, rebuilding I ceases And consolidates deathbed one as far as I consul To where it billionaires in the injuries

^^Yikes, that's dark, RiTA! I won't be using this but watch this:

Two [roads organizational] diverged in a [yellow impugning] [wood whittle], And [sorry confessing] I could not [travel teapot] [both both] And be one [traveler tumbler], [long inflict] I [stood autistic] And [looked apologized] [down down] one as far as I [could clawed] To where it [bent bittersweet] in the [undergrowth incinerators];

These are the word pairs it's claiming for alliteration. Some are truly weird. I feel like this would need some human editing if you were to use it in text generation, or else I might just throw out anything that doesn't start with the same letter as the base word (those all seem to work well!).

Signing off for now, I'm going to keep working on Bob Frost then see what else I can do in RiTA.

marythought commented 8 years ago

Bonus: here's the whole poem w/ Similar By Sound replacements:

screen shot 2015-11-12 at 12 45 47 am
marythought commented 8 years ago

DAY THE LAST

After debating what to do with my poor poem-that-is-not-a-novel I decided to go ahead and use Rita's markov functionality, but use it on the poem as source material. What results is an epic memoir poem that doesn't have much plot but generates some interesting language. Not bad for a first attempt!

My #NaNovGenMo2015 Submission

And here is the source code

How I made it:

  1. Generate new "Where I'm From Poem" over and over and save in a source text variable until 50,000 words.
  2. Feed that source text to RiTA markov and generate 5000 sentences in an array (I started with 1000 and that didn't seem like enough)
  3. Make new empty text variable for the output
  4. Until the output text reaches 50,000 words:
    • generate new poem and add it
    • generate between 0-40 random lines from the markoved sentences
    • generate between 0-20 sampled lines from a new generated poem
    • generate between 0-40 lines of markoved (again)
    • rinse and repeat

I was going to serve up the results through express and node just like with my poem generator, but as soon as I got close, I ran into an 'Maximum call stack size exceeded' error. So, eff that. Markdown it is! An interesting aspect of markdown is that it doesn't preserve all the line breaks. I played with this and ultimately decided that I liked the paragraphs/prose poem format for such a long text document, so I left it alone (for a formatted version, see my earlier attempt which does preserve line breaks). I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.

This was fun! I still have Bob Frost to play with, and coincidentally a little project I'm working on called "Walk or Not" fits well with my Ritafied poem. I learned a bunch about natural language processing this month and feel much more comfortable working with RiTA and JavaScript.

Questions or comments? I will answer what I can... if I do it again, I'll be purposeful about chapter headings or something that can break up the 50,000 words to help the flow. At this point, though, I can tinker no more.

Thanks for the opportunity and see you next year!

hugovk commented 8 years ago

Have a completed label!


I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.

These are mainly aimed at bots, but should still be generally useful.

Here's a JavaScript, Python, Ruby and PHP word filter: https://github.com/dariusk/wordfilter

Here's a headline filter: https://github.com/molly/CyberPrefixer/blob/master/offensive.py

Tips on transphobic joke detection: http://tinysubversions.com/notes/transphobic-joke-detection/

Some lists of bad words: https://github.com/shutterstock/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words https://gist.github.com/ryanlewis/a37739d710ccdb4b406d http://www.bannedwordlist.com/lists/swearWords.txt

[Inactive] muted Twitter topics: https://github.com/sjml/bot-innocence

Some general etiquette things: http://tinysubversions.com/2013/03/basic-twitter-bot-etiquette/ http://www.crummy.com/2013/11/27/0