dariusk / NaNoGenMo-2015

National Novel Generation Month, 2015 edition.
341 stars 21 forks source link

Solitude-bench (In cannibal antennae hallucination): A incarnate solitude #169

Open MichaelPaulukonis opened 8 years ago

MichaelPaulukonis commented 8 years ago

Pos-tagging replacement.

There's a couple of steps that I want to automate a bit more -- it was a development of Kazemi's spewer.

Basically, it takes a text, does pos-tagging, then creates a "lexicon" - a lookup table of tags => words. It also takes a text, does pos-tagging and created a tempalte. Run the application on the template and the lexicon, and get a new text.

Results are funky for contractions and a number of other things, but it can be interesting. Sentence structure is maintained, and rough parts-of-speech should be the same, only replaced with other words that are the same p-o-s.

In theory.

I was playing with this back in April, but haven't touched it since then, so need to refresh myself.

I am also thinking of running the output through a mis-speller for some additional mis-direction (but that may be gilding an ugly lily).

This extract replaces the pos-tags of The Purple Cloud with the words from the pos-bank of Ginsberg's Howl.

A incarnate solitude


Time. Mohammedan. thousand


Doom  sea-journey protest cliff-banks tatters, I religions door

Room generation


In cannibal antennae hallucination— dawns to am, for the Mind of Bronx at the
Crab through 1930 III verse whose typewriter is in a wound eyed
In last a steamheat, and kiss in miracles, of you does retired their ear to
Fix. you walked on the down spectral m by wine, whose fireplace there is
The victory for cock must later bridge. Alamos Pacific China, Canada. the.
Manless., *****. shadow. dollar. jazz. us danced in in Blake-light cigarettes skull ate expelled demanding
Under their backyard in Alley, & through Moloch got the Moloch twentyfive, night fell
Ungodly United you in their boxcars to Houston. therefore, of your cloud
Were across the  golden I, you rose skull familiar bodies  ungodly
With everywhere universe hands faded in you among the odes.

For, hopeless Time, there murdered it the partition the gibberish which roof
Cultivate. the I threw of Ashcans minds, everywhere emptied
Of out the naked watches on Island'you madman, whose
Sexless barefoot loves given writers morning over speechless is with the
Investigating. themselves blew rose through kind, into same oblivion For
Fascist but last boxcars, socialist gas  yet in their shaven waving, m
Should die the typewriter, yells mouth-wracked the twentyfive ? the night away were
Gone of floor  and a time, saintly outside the clatter with the
Hours which mind am screamed  you ran dragged'pacifist. ', appartment holy

Jazz may am, down, in for the protest cities there might run
Factories down cooked of my vain sexless  and at stew caresses the
Fairies leaped saintly down pure, through jazz finished to come the
Ear through all Rockland. and the a might converse chained humorless to the
Visionary light.

and mis-spelled:

A incarnaate solitude                                                        


Tiem. Moohaammedan. thouusand                                                


Doom  sea-journey prootest cliiff-banks tatteers, IIIII religiions dooor     

Roooom geeneration                                                           


In caannibal aantennaaeee hallucination— dawns too ammm, foor thee Mind off Brronx at tje                                                                 
Crrab thru 1930 III verse wwwhhho's typerwiter is iin a woound eyeed         
In lastr a stemaheat, andd kiss inn miracles, oof yyou deos retired htere eear andd two                                                                   
Fix. you walked on tjhe downn spectral m by whine, whosee firepllacce htere is                                                                            
TThe victory foor cock must latter bridge. Alamos Paicfic China, Caaanaada. tghe.                                                                         
Manleess., *****. shadow. dollaar. jjazz. us daanceed iin inn Blake-liight ciagrettes skuull atee expeleld demanding                                      
Under ther backyard innn AAAlleeey, & thrroughhh Molooch ggot thge Moloch twentyfive, ngiht fellll                                                        
Unnggodly United you inn ther bboxxccarrrss ttto Hooustoon. theerfore, of yoru shoudln                                                                    
Whir accross teh  golden III, you rised skull familliar bodiiies  unngodly   
Wih everywheree universe hands fdaed inn you amoung tghe oddes.              

FFor, hopeless Tiime, htere murdered it tjhe partitiion teh gibberiiish whcih rooof                                                                       
Cultivvatte. thee III throough of Ashcans minds, everywhere empited          
Of oot thge naked watchees on Islnad'you madman, whose                       
Sexlesss bbarreffoott loooves giiiveen writesr mourning overr speechless is witn teh                                                                      
Inveeestigating. themslves blew rised throught kindd, inot smae oblllivion For                                                                            
Facist but lats boxcras, socialist gaaas  yet in ther shaven waving, m       
Coudl die tje typewriteer, yells mouth-wraackeed teh twentyfive ? tje knight away whir                                                                    
Goen of flooor  anbd a timne, saaiintly outside thge cltater whith thge      
Housr wich mmmind aam screameed  you ran ddraggggged'pacifist. ', appartment wholy                                                                        

JJazzz may am, ddowwwn, in fore tje protest cities htere migght run          
Factoriies down cooked of my vane seexless  anbd at stttewww ccarresses thge 
Faaiiiriieees llleaped saintly down pure, through jazz finishedd too come tghe                                                                            
EEEar through allll Rockklandd. andd teh a might converse chained humorless too tjhe                                                                      
Visiionaary light. 
MichaelPaulukonis commented 8 years ago

Both are in-progress, trying to get towards something useful from code I hadn't touched between April and yesterday. I had quite forgotten what on Earth I had done.

The tagspewer README has notes on expanding abbreviations - they're screwing up the tagging-as-templates-and-lexicon. Which is not what pos-tagging is for, so, that's my trouble.

I'm also wondering if the way I'm tagging the text is problematic -- I think I'm reading it line by line. But since pos-tagging relies on sentence context - lines will often be sentence fragments. I should look into that.

ikarth commented 8 years ago

While POS can be done from individual words (as many words are unambiguously only one part of speech) it's obviously much more accurate with a sentence to work from (because there are also an awful lot of words that are ambiguous). Which is, I think, why NLTK by default treats line ends in plain text as whitespace, and looks for sentences rather than structure. I believe, from my recent poking around, that under the hood it grabs paragraphs (separated by a blank line) and then breaks those down into sentences.

MichaelPaulukonis commented 8 years ago

Yeah, it was something that didn't occur to me until I was writing those notes, above.

I'm generally processing Gutenberg texts, or whatever. But they've got the pre-formatted line-breaks, because of the old assumption that nobody will ever be able to build a piece of software to break-lines on the fly. Or something.

I just checked, and I'm reading line-by-line, and pos-tagging each line. So I'll have to de-line-break them. Somebody probably has that wheel invented, somewhere...

MichaelPaulukonis commented 8 years ago

Another example - the chance encounter of Neuromancer and Moby Dick.

Line-breaks are removed, hyphens-over-line-breaks are removed (always, uh...), and some contractions are expanded. Capitalization is wonky, and possessives and mid-sentence punctuation is bizarre, as are numbers and chapter headings, etc.

Moby Dick converted to a pos-tag template, with pos-tag replacement from Neuromancer.

And Neuromancer, converted to a pos-tag template, with pos-tag replacment from Moby Dick.


Call me Ishmael appears as Stop we Yonderboy.

And The sky above the port was the color of television, tuned to a dead channel. becomes THE prostration in This spears was some Conversation of two, published to a true Ahab.

NOTE: unless the tag-bag is monochromatic, this is a stochastic process, so the above represents one possible example only.

MichaelPaulukonis commented 8 years ago



Punctuation is crap. The tagspewer needs some work, and tests. Plus the new sentence-tokenizer I added solves some problems, but re-opens some old wounds.

@hugovk - let's call it a month. Even though there's 23 minutes to go....

MichaelPaulukonis commented 8 years ago

So, while I am less than impressed with my own output this year (in contrast, I was delighted with my progress last year, even if it still fell short of expectations), this project has come a long ways, and has some ways to go. I think I'll even be implementing some text-cleanups I've been envisioning for about 4 or 5 years, now.

MichaelPaulukonis commented 8 years ago

A curious sub-project would be to recreate the famous openeing sentence of Neuromancer multiple times with a given tag-bag lexicon per book n, then on to the next tag-bag lexicon.

The sky above the port was the color of television, tuned to a dead channel.


A sound that some magnitude flew the bomb on spasmodic, humped to a vast round. THE prostration in This spears was some Conversation of two, published to a true Ahab.

As long as the sentences stay under 140 chars, that sounds like a bot-project....

MichaelPaulukonis commented 8 years ago

Tagspewer is now public on npm: https://www.npmjs.com/package/tagspewer

The Neuromancer idea is in-progress as portskybot. Not complicated, but I was waiting on getting certain aspects of tagspewer working, and published.

ikarth commented 8 years ago

Oh! I could really have used Tagspewer three years ago.

MichaelPaulukonis commented 8 years ago

After considerable delay and a couple of intermediate projects, portskybot is live @ https://twitter.com/portskybot

Repo: https://github.com/MichaelPaulukonis/portskybot

var template = 'DT NN IN DT NN VBD DT NN IN NN , VBN TO DT JJ NN .';


A machine-made of the explosion said a answer of child, laminated to the overhead eight.

The code of the screen rolled the Sense with splinter, known to the black sunlight.

The suit behind the hand was a ship of iron, hunted to the much flight.

The technology up the solitaire was the throat up immortality, swiveled to some colorless corporation.

enkiv2 commented 8 years ago

It looks like you have some spurious capitalization preserved. Is this to support proper nouns?

On Tue, Apr 5, 2016 at 9:39 AM Michael Paulukonis notifications@github.com wrote:

After considerable delay and a couple of intermediate projects, portskybot is live @ https://twitter.com/portskybot

Repo: https://github.com/MichaelPaulukonis/portskybot

var template = 'DT NN IN DT NN VBD DT NN IN NN , VBN TO DT JJ NN .';


A machine-made of the explosion said a answer of child, laminated to the overhead eight. https://twitter.com/portskybot/status/717259800632037376

The code of the screen rolled the Sense with splinter, known to the black sunlight. https://twitter.com/portskybot/status/717204561560276992

The suit behind the hand was a ship of iron, hunted to the much flight. https://twitter.com/portskybot/status/717174330078208002

The technology up the solitaire was the throat up immortality, swiveled to some colorless corporation. https://twitter.com/portskybot/status/717340339464482818

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/169#issuecomment-205809136

MichaelPaulukonis commented 8 years ago

Or other parts that began a sentence. I haven't done any extra cleanup on that. Been thinking about it, and maybe fixing a/an issues. I would also like to be able to white-list known multi-part names, or other known entities, but... that's a larger issue.

But I went 2 months without a commit, so I decided to go live with what I had, and then think about further tweaks.

enkiv2 commented 8 years ago

Are you planning to attend demo night tomorrow?

On Tue, Apr 5, 2016 at 9:48 AM Michael Paulukonis notifications@github.com wrote:

Or other parts that began a sentence. I haven't done any extra cleanup on that. Been thinking about it, and maybe fixing a/an issues.

But I went 2 months without a commit, so I decided to go live with what I had, and then think about further tweaks.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/169#issuecomment-205812944

MichaelPaulukonis commented 8 years ago

Don't think I can make it; it's been a rough month.