NaNoGenMo / 2016

National Novel Generation Month, 2016 edition.
https://nanogenmo.github.io
162 stars 7 forks source link

Running out of Markovs to chain #143

Open serin-delaunay opened 7 years ago

serin-delaunay commented 7 years ago

IPython notebook: https://github.com/serin-delaunay/NaNoGenMo2016/blob/master/RunningOut.ipynb Output (strict, 19589 words): https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov.txt Output (late, 53,020 words): https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov_v2.txt

Going to try a very simple project for the last day: an ngram-based Markov chain trained on a variety of books from project Gutenberg, but with destructive output.The first time a rule is selected from the model, the rule is deleted and it can't be used again. The novel should sound normal (for a Markov story) at the start, but gradually (or quickly) turn into something really odd.

I'll probably have to find some workarounds to make it reach 50,000 words wothout stopping - such as:

serin-delaunay commented 7 years ago

First, attempt, using the following sources:

Rules were strictly deleted after one usage. Ngrams were of length 2-5, alternating words and whitespace/punctuation. The output story only reached 3337 words, and they were too heavily slanted towards middle/early modern English, and fewer line breaks might be nice. Memory usage is almost a gigabyte.

Sample from the start:

Leue --
but increase of. power! he gazed, he remembered going,' she flung himself, departed equally welcome; and giannotto but," protested the “side” of cristen folk.
--

'twas under!'

there sold. rer. ital. script.
(muratori: suppl. tartini) ii. 5 ] 
and, though,' said gentlemen not,
caused him!

he watz arȝe mony, 
sir gawayn, and
'tha'.

she translated was! yet hold-door
quickly, he, being laid,
  yielding at
nights are!'

Sample from the end:

at cannes or slayn;
and ordered
with boydekyns anoon;
right away,' he graunted, and coome agayn!
this requeste.
so beauteous widow was. i
crucified christ?" "ay,"
returned nello's seeing how.
as courteous as-tyt! bot þaȝ men./ if mellors.

'honour, but 'twill
be. but:--"no! no. no, leave clifford; the draught, and lach þyn awen.'
'þis kastel to þyn aunt.
is 'a tourneiynge;
for kings' and
'tis
serin-delaunay commented 7 years ago

Second attempt, using the following sources:

Rules were deleted with probability 0.99 after each usage. Output was 7236 words, nearly 1.7 gigabytes memory usage. I think it would be a good idea to make the generator favour productions from long ngrams, so that when those are all inaccessible it can fall back to the shorter, more permissive ngrams.

Sample from the start:

' s sleight-of.

when jeeves to!' and galegantius the
/bakbitere./ after crowe,
and--as cm is't 'ave bothered, lady
jane, but--'

'and bete that. 'two strokes, and--'

'of sense
hard, but
inactive, and,
besides, gathered here (how strange weight hate: as depe of ; 
therefore each make." quod merlin, a, h, and shameful!
and snoring loudly.

Sample from the end:

'so roially.
this, ffor how? if fl, 
and toddling away;
ffor mannes foul presumpcioun!
ffor also.

this journey." the metynge. and, "wite alle.
"i gain or sounder þat place!' she peyneth hem,' set their
lovers, know andy ... and oynementes. and cryden, out! harrow! and smoke...'

'will disclose, seeing death,
and devotedly at yowre-self com agains accidie. for -- of bitterness.

she believed it.

writing rubbish?' asks her--i--i--i--
serin-delaunay commented 7 years ago

Unexpectedly short stories in buggy versions of the generator:

 windowedisposingbarelystynkyng?
How.

 '-," !
Binomial,  ; 
. [:
serin-delaunay commented 7 years ago

Third attempt, same sources. the program attempts to find a rule for the most recent 5-gram, progressing to 4-grams, 3-grams, and 2-grams if it fails, and stops if none are found. 4127 much more coherent words were generated. The text stops at in jest. from Ross's Troilus, because nothing else in the source texts has jest.

Sample from the start:

Him
prisoner, and by good
luck i happened to
win, and devote
himself and away with
them, and graciously bade her a
thousand times, to giacomino's
house, where, while in this.'

and he gives, what thinks he is: were it nat for fere,
as she thought.

Sample from the end:

'oho, that's wanted by the, thow oughtest to be (as he began
piteously to, beseech her not weep and
bitterly bewail herself; but being minded to
abet? and again somewhat
rudely, and still something of a, namely cg, gd, and ef, the square equal to each
may seem best; for well thei wolden, and ther shall i
know what else is there
of discernment, worshipful my ladies, we held discourse
of the, pandare, i kan never seye nay.
what! quod this senatour repaireth with victorie and the
cardinals and many tymes; and by doing after this
i gabbe nat, so have ye right welcome, and let's
niver fight! i love him, but it hevyeth me whan i am." than seide amaunt, "ye haue with-outen loue is in mariage hony-sweete;
and for-thi, werk som-what haue i, myn uncle," quod she;
this thank have i
yearned to hold 
in jest, in dreams, in supplications, 
in jest.
serin-delaunay commented 7 years ago

Fourth attempt. I've altered capitalisation behaviour so that the first alphabetic character (including thorns and yoghs) after a full stop, question mark, or exclamation mark is capitalised, and the word I is capitalised (actually it's not, but I don't have time to debug that). The proper names in the source texts are quite diverse, so I'm not sure it's worthwhile trying to catch those. I've added chapters, so that whenever the generator gets stuck and gives up, it can start again (still with the restricted rule set). I've decreased the possibility of rule deletion to 0.95.

The chapters get progressively shorter (but don't reach zero length), the process gets gradually slower, and it has real difficulty reaching 15,000 words. I think it needs a larger corpus.

Sample from the start:

 Rout þe raynez he tornez,
halled out at
florence one that honourably entreats you to-morrow!

Aeneas.
We know not even peter, though he
was free, i will, sweet queen.

Helen.
She shall have
what he says, too busy to pay
him out by
spinning what her new fere.

Song, it was you.

Sample from the last chapter before I interrupted the process:

 Schyndered þe bones,
and schrank þurȝ þe fryth and a love-affair,' she said. 'Charming! Charming! Sir john!'

And she reached it just can't! I hope there's the
sort. Wherefore she said:
oh, no! I've tried drawing with my consent be
buried like a rabbit came
so near that i, with hertely wyl they sworen and assenten
to al this; for god woot, ther lith no remedye.
Upon that oother
marchandise, that men sometimes frame messages in such high qualities merit not oblivion--was madonna
oretta's apt to convey him in, and they so, that the “!
serin-delaunay commented 7 years ago

Attempt 5. I've reduced the deletion probability to 0.9 and added the following source:

Output: https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov.txt

19589 words according to notepad++ - my code reported over 20,000, but whatever.

Sample from the start:


   ---   Chapter 1   ---   

Of £100 and a sense, asked to see
me.'

She leaned against the dogmatist can promise us. For me.'

Mother turns. 'I don't.
It's 'appen better never to
quit bologna, until he falls in love.
How were i, for
thy conseillyng, certes, my conseil al.
For sith i woot,
another seyde the kyng and queene
(though neptunus have deitee in the
     critique itself it rather than any
number. Now, just as she
says that i, woful wrecche and in
amity with god, being minded remain there. So,
the more men she might. But, though she were
a saucerful of tainted milk, but he
knew that god's will, and of
thee: wait and speak with all your
husbands, that, when from time to overflow all contradictory predicates, only
one can belong to, the wolves would have
thee lie to-night."
"With pleasure," returned the master gave the
word curved is superfluous. Now, when
they begin talking nasty then. But i've 'borrowed' a can of itself?
But this previous condition and conditioned in phenomena,
however small, without drawing it across at
her. At first
sight, having never seen one in, with their own.

Sample from the end:

   ---   Chapter 80   ---   

Fro
youre herte slyde.
What deyntee sholde a conseil hyde./ For salomon
seith, -- it is pathologically affected (by
sensuous impulses); it is square; 
therefore a binomial straight line"), no better employe, for thei roos at mydnyght.

Whan the boordes were vppe, than was gaudius and his. One of clifford roused her fair
companions for the?
serin-delaunay commented 7 years ago

Late output: https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov_v2.txt

Strictly speaking NaNoGenMo is over, but I changed the method of choosing the starting token:

Originally I chose a random ngram in the Markov model, and chose a random production from that ngram's rule. Later I made a list of all available productions in the whole model. That was really expensive!

Now I make a set of every word encountered in the source text during parsing, and convert it to a list. To start a chapter I choose one at random, and delete it from the list. That makes the whole novel generation process muuuuuuuch faster. My code underestimates the number of words output, so I set its target word count to 55,000 and got 53,020 words in 762 chapters. The generation process took a matter of seconds. Parsing is still really slow, though. Sample from the start:

   ---   Chapter 1   ---   

Underwrite in an unconnected and
rhapsodistic state, but is
dependent on the flat.

'What is me
that 'tis seldom indeed, when i gave her, he
said. 'If we floated like tobacco smoke of those. If a point,
because i love, ywis,
for in truth borrowed from experience--it is
certainly not soothing. I am, it's...'

'Oho, that's
working for money. You had dropped in somewhere to put the
utmost contempt of those, young as you
want.'

Sample from the end:

   ---   Chapter 757   ---   

Investing him with:--"what
means this, sir?" Quoth he.

   ---   Chapter 758   ---   

Abcm to the:

   ---   Chapter 759   ---   

Knoweth his penaunce
was queynt and al day,
and she kisses me on.'

'Cross your heart before; this follows naturally, according to?

   ---   Chapter 760   ---   

Swowned ther he hideth hym and spak namoore, but in, left behind in the;

   ---   Chapter 761   ---   

Joins in her.

For the big, hollow sandstone slab of the.

   ---   Chapter 762   ---   

Werreieth
troughe wityngly, and deffendeth his folie,
so?