Open christiaanw opened 10 years ago
Ok, so I made this thing that runs throught the wordnet ontology starting from a root noun and going over every hyponym recursively, and dumping definitions and names in to sentence templates. I started off with animal (after the Celestial Emporium of Beneficial Knowledge), but that does not yield quite enough words, so then I ran it for entity.
Might need some formatting into a nice pdf, but am having issues with pandoc.
I'm not feeling done with NaNoGenMo, yet, though.
--updated comment after moving around with repositories
Interesting animal stays almost animally all the way down, but entity quickly spreads out.
Seems to be a problem with capital vowels:
A packhorse is... An omnivore is... A Irish water spaniel is...
And any chance to detect mass nouns?
A seawater is...
I hadn't noticed that capitalization doesn't work for capital vowels. I used out-of the box functions from Pattern and NLTK for everything with some workarounds for the most obvious issues. I had noticed the mass nouns:
A matter is...
And I figured I could maybe infer whether to use the article from the use of articles in the usage examples for each synset, but that is not reliable enough. Also, some synset definitions lack an initial 'a/an'
A hearing dog is dog trained to assist the deaf by signaling the occurrence of certain sounds.
But just adding a/an might improperly catch mass nouns.
I'm satisfied with this for now, as I'm more interested in exploring meaning than getting the details of grammar in order for my entry for NaNoGenMo.
Forked the repo, moved the print-wordnet thingy into it and added some other things I'm working on to it. In extract-phrases I've hacked together thephrase extraction utility from patent-generator to extract two different kinds of text chunks to make a single huge sentence. Could be longer but I has some kind of error with the gutenberg header cleaning that halted the extraction process prematurely. It yields something quite similar in tone as @cpressey 's poetic inventory.
Extract from the generated novel:
Consider a low man, or a hard-working brother, or a first princess, or an old kingfisher, or an ingenious master, or a stout woman, or a new tree, or a soft female, or a sixth man, or a heroic defender, or a slender-culmed grass, or an unfortunate companion, or an importunate person, or an own partner, or a black cook, or a third caller, or a head wife, or a christian monarch, or an unlucky friend, or
Lastly, in segmented-markov I'm trying to mess with a Markov language model based on Peter Norvig's letter n-gram counts to generate a weighted random string of characters, which then get pushed through his text segmentation functions, yielding stuff like:
st men ag gazon dfesses tura media ls of fork to texputoneculdesoumst on forded for tsr urns or misha m who ment tea tions t
Python gets into recursions fast when segmenting the text with Norvig's code, so strings larger than about 200 chars will give a RuntimeError: maximum recursion depth exceeded. Might generate a 250.000 character string and pass it in 100 char chunks to it. Could also write a weighted monkey script banging out Cicero with it? Or it could be combined with checkerboard layout, perhaps? Maybe toss out the junk?
Interesting stuff. Note that it is possible to increase Python's recursion depth limit, if you think it will help, at your own peril -- a quick web search returned this eye-opening article...
You could just convert it to a loop with an explicit stack. It would be a lot safer than screwing with the recursion limit, and you could even keep args in order by using a list of tuples as your stack.
On Wed Nov 12 2014 at 11:25:33 AM Chris Pressey notifications@github.com wrote:
Interesting stuff. Note that it is possible to increase Python's recursion depth limit, if you think it will help, at your own peril -- a quick web search returned this eye-opening article http://seriously.dontusethiscode.com/2013/04/14/setrecursionlimit.html ...
— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/29#issuecomment-62746072 .
@enkiv2 Yes, rewriting it with an explicit stack would be proper engineering (but I'm all about the science this year you see, and I thought that article was a nice bit of, uh, Python science. shudder)
Plus there's always that certain faint dishearteningness that comes with making edits to third-party code, no matter how nice the code and/or the license. Should I send these upstream? Should I maintain a fork? Etc.
@christiaanw I'd be honoured if you (or anyone) could do something with checkerboard-layout; I have a few more ideas along those lines, but didn't want to do too many "optical" experiments because they seem... slightly out of sync with the rest of NaNoGenMo.
I know NaNoGenMo from scouring Github for useful python code for a project I am (was?) working on. Given that there were some interesting contributions last year, such as In-Dialogue, The-Swallows and the NovelHarvesterBot I'm thinking of a hack taking off from one of these approaches to get something interesting.