Open marythought opened 8 years ago
he's such a card
You can download CDs and DVDs of Project Gutenberg books here: https://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project
I didn't know there is a Google Books API, I'll have to check it.
In my teaching years, this poem was everywhere:
Where I'm From (George Ella Lyon)
I am from clothespins, from Clorox and carbon-tetrachloride. I am from the dirt under the back porch. (Black, glistening, it tasted like beets.) I am from the forsythia bush the Dutch elm whose long-gone limbs I remember as if they were my own.
I'm from fudge and eyeglasses, from Imogene and Alafair. I'm from the know-it-alls and the pass-it-ons, from Perk up! and Pipe down! I'm from He restoreth my soul with a cottonball lamb and ten verses I can say myself.
I'm from Artemus and Billie's Branch, fried corn and strong coffee. From the finger my grandfather lost to the auger, the eye my father shut to keep his sight.
Under my bed was a dress box spilling old pictures, a sift of lost faces to drift beneath my dreams. I am from those moments-- snapped before I budded -- leaf-fall from the family tree.
For my first trick, I'll be working on a poem generator (I know I know, we're building a novel, stay tuned ok) to identify the parts of speech at work here and generate new "I'm From" poems that mimic parts of speech and important sound patterns. This should be good practice in working with natural language processors in order to generate poem-length memoir-esque bits of text -- which I can then use as the base for further novel expansions.
Not a bad start! I got RiTa loaded and working, so that's a huge step in the right direction. Next I think I need to find some word banks / corpora for specific parts of the poem (example: nature words). Rita's proper nouns are kind of cringe-y but I'll run it more times and see if I need to substitute something else there. FYI for anyone getting started with Rita, here's a list of the parts of speech abbreviations:
Yes! One of my to-do items is to make a pull request to make the PoS list more prominent in the RiTa documentation...
I spent a few hours this evening working on linking up random choices from custom word lists. I forked Darius's corpora repo linked in the NaNoGenMo resources and also found some good word lists on the internet for what I am looking for. Fun fact: as a middle school English teacher, I loved word lists, or "word pools" we would sometimes call them. The walls of my classroom were plastered with posters of color words, verbs, adjectives, sensory words, etc. (until mandatory testing took over the entire Spring and they had to be covered up).
Sticking with the corpora format, the word lists are in JSON. JavaScript isn't my first programming language, so I had to google "how do I link a local JSON file to my javascript" and Y'ALL this should be a lot easier, doncha think? I did not want to involve html files or ajax requests (eeek) or jQuery (no!), at least not yet, so I cheated by just making my word list files .js files and then requiring them.
Like this: Fear the Repo
Shush, You, I'll DRY it up later.
I'm pretty happy with how it's shaping up, I love using RiTA to be able to control syllable length.
As a reminder, the source poem is here.
I'm hoping to finish assembling the poem tomorrow, then I can figure out where I want to take it from there.
Just checking in with some sample output. I wasn't happy with the trees and bushes lists available to me, so I'm just inventing some instead. :D Done through second stanza, two to go!
Names list is 1,000 randomly generated names from list of random names -- if anyone wants to be added, I'm happy to add you! (ps repo is here)
I am from nightclubs, from Mart and fundamentalism. I am from the aisle under the common room. (Navy blue, feminine, it smelled like cranberry.) I am from the tulip spruce the yellow corkbark birch whose diverse caps I remember as if they were my own.
I'm from parsnip and statistics, from Wendie and Marcelle, I'm from the slam poets and the smart-alecks, from 'well done' and 'what'! I'm from 'He was born with a gift of laughter' with a corrosive porcupine and four I can say myself.
We have a completed poem!
Where I'm From
I am from birthdays, from Big Mac and collectivization. I am from the trot under the storm cellar. (Cerise, thirdquarter, it tasted like jackfruit.) I am from the stinking cottonwood the tan lilac yew whose emerald hares I remember as if they were my own.
I'm from celery and byproducts, from Fransisca and Scottie. I'm from the slam poets and the mean girls, from 'hallelujah' and 'just kidding'! I'm from 'Call me Ishmael' 'It does not matter how slowly you go so long as you do not stop' and four pamphlets I can say myself.
I'm from South Gate and Beaverton, shredded banana squash and cooling smoothie. From the neck my stepsister sewed in a football game, the thumb my mum trailed to keep their smell.
Above my tea cart was a aft box holding soft frictions, a sift of lost faces to drift around my dreams. I am from those moments-- brooded before I dabbled-- leaf-fall from the family tree.
For the next step I can go one of (at least) two ways:
I like this.
Ok, after taking some time off to learn all the data structures and algorithms (or not learn, as the case may be), I needed a quick win so I came back to this and was able to publish a version of the poem generator!
It's not very fancy, and probably breaks all the Node/Express rules (I am a very proficient Ruby on Rails developer seriously you should hire me), but it meets the prime objective of generating a new poem on demand.
I like this so much I am not sure how to translate it into a novel... but let's not call it "done" yet, because I'm going to sleep on that.
I found a couple open-source texts that work well for "memoir" style (Anne of Green Gables is the frontrunner), so I played with using RiTA to markov it up. My idea was to start with the base text, and then see if there's any way to prioritize the keywords generated in the 'Where I'm From' poem (so it would be a poem followed by short vignette featuring terms mentioned in that poem, and then more in that pattern).
It's interesting, but it isn't very readable in paragraph form. So I think I need to consider another method for text generation. Which puts me back at the starting line. :)
Maybe I'll just write more poems...? #NaPoGenMo! I'm not 100% invested in the novel form, at least not for my first experiment this year, but I'm shooting to adhere to the 50,000 word count...
+1 NaAnOfGreGaGenMo!
Ah, if only I wasn't already overcommitted...
(There really is a NaPoGenMo too btw, but it's held in April.)
Some quick text to share, I'm playing with the RiTA RiLexicon to find near replacement words for a classic poem (again with the poems!!! she just won't stop...). My goal here is to generate output that is clearly recognizable, but sounds bananas.
You might be curious, what is the difference between Rita's RiLexicon methods similarBySound(), similarByLetter(), similarBySoundAndLetter(), and rhymes()? So glad you asked... let's take a look at each of these at play! Each method returns an array of matches, so the computer is choosing a random match (or the original word) each time.
Compares the phonemes of the input word (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.
Two reeds divert in a yell good, And soggy I could not trammel berth And be one travels, pong I staid And cooked dean one as far as I curd To where it burnt in the undergrowth;
Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.
Two loads diverged in a fellow wood, And sorry I could not travel booth And be one traveler, song I stood And cooked dawn one as far as I mould To where it bet in the undergrowth;
First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter();
Two rods diverge in a bellow good, And sorry I could not travel bath And be one traveled, pong I stood And cooked doan one as far as I could To where it vent in the underwrote;
Two words rhyme are considered as rhyming if their final stressed vowel and all following phonemes are identical
Two episodes diverged in a mellow likelihood, And safari I could not travel both And be one both, strong I sainthood And overlooked clown one as far as I withstood To where it dissent in the undergrowth;
I hadn't tried by letter before this little exercise (thinking the sound would be more important) but I actually like that output the best, here. It does seem to be keeping the sound and rhythm of the word as well. Linguistical coincidence? Edit-distance magick?
Rhyme is clearly variating greatest from the source text -- this could be fun to play with for replacing end words (or generating new rhyme words) but I won't use it in this "replace nearly every word" exercise.
Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon
Two razor diverged in a abuse wings, And scratch I could not fanatic both And be one entitled, rebuilding I ceases And consolidates deathbed one as far as I consul To where it billionaires in the injuries
^^Yikes, that's dark, RiTA! I won't be using this but watch this:
Two [roads organizational] diverged in a [yellow impugning] [wood whittle], And [sorry confessing] I could not [travel teapot] [both both] And be one [traveler tumbler], [long inflict] I [stood autistic] And [looked apologized] [down down] one as far as I [could clawed] To where it [bent bittersweet] in the [undergrowth incinerators];
These are the word pairs it's claiming for alliteration. Some are truly weird. I feel like this would need some human editing if you were to use it in text generation, or else I might just throw out anything that doesn't start with the same letter as the base word (those all seem to work well!).
Signing off for now, I'm going to keep working on Bob Frost then see what else I can do in RiTA.
After debating what to do with my poor poem-that-is-not-a-novel I decided to go ahead and use Rita's markov functionality, but use it on the poem as source material. What results is an epic memoir poem that doesn't have much plot but generates some interesting language. Not bad for a first attempt!
And here is the source code
How I made it:
I was going to serve up the results through express and node just like with my poem generator, but as soon as I got close, I ran into an 'Maximum call stack size exceeded' error. So, eff that. Markdown it is! An interesting aspect of markdown is that it doesn't preserve all the line breaks. I played with this and ultimately decided that I liked the paragraphs/prose poem format for such a long text document, so I left it alone (for a formatted version, see my earlier attempt which does preserve line breaks). I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.
This was fun! I still have Bob Frost to play with, and coincidentally a little project I'm working on called "Walk or Not" fits well with my Ritafied poem. I learned a bunch about natural language processing this month and feel much more comfortable working with RiTA and JavaScript.
Questions or comments? I will answer what I can... if I do it again, I'll be purposeful about chapter headings or something that can break up the 50,000 words to help the flow. At this point, though, I can tinker no more.
Thanks for the opportunity and see you next year!
Have a completed label!
I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.
These are mainly aimed at bots, but should still be generally useful.
Here's a JavaScript, Python, Ruby and PHP word filter: https://github.com/dariusk/wordfilter
Here's a headline filter: https://github.com/molly/CyberPrefixer/blob/master/offensive.py
Tips on transphobic joke detection: http://tinysubversions.com/notes/transphobic-joke-detection/
Some lists of bad words: https://github.com/shutterstock/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words https://gist.github.com/ryanlewis/a37739d710ccdb4b406d http://www.bannedwordlist.com/lists/swearWords.txt
[Inactive] muted Twitter topics: https://github.com/sjml/bot-innocence
Some general etiquette things: http://tinysubversions.com/2013/03/basic-twitter-bot-etiquette/ http://www.crummy.com/2013/11/27/0
Some considerations: I'll be coding in
RubyJavaScript I'd like to try using the Goodreads API / Google Books API (or something similar) in some way Use text from Gutenberg or scrape from internet? I've yet to try scraping so that could be interesting (plus Gutenberg has a pretty strict anti-robot policy so texts would need to be downloaded) My husband's idea: find an appropriate sci-fi novel and replace all instances of "snake people" with "millennials" (I am not making this, but somebody should)