NaNoGenMo / 2019

National Novel Generation Month, 2019 edition.
97 stars 5 forks source link

The GANterbury tales #79

Open hobg0blin opened 4 years ago

hobg0blin commented 4 years ago

I'm working on a collection of short stories built around a GPT-2 model finetuned on the original Middle English text of the Canterbury Tales. Some initial code and cleaned text are here, plus a cleaning script I used to get rid of a bunch of footnotes if that's useful for anyone. Working on an initial retraining now, I'll check back in here soon!

hobg0blin commented 4 years ago

I'm finished! A few different versions are in the repo - I initially trained for 20,000 steps and found that it was heavily overfitted. Tried 10K and 15K versions and it worked a bit better. I used a method of pulling a sentence from the original text as a seed, generating a few paragraphs from GPT-2, getting a character count from the generated text, moving to that point in the original, and using the nearest sentence as a seed.

Sample output from 10,000:

This world is now ful tikel, sikerly;
We breken it al to-morn as a shadwe;
And ech of us bicomen otheres brother,
And ech of us spinners, and we xens,
And gerlandes, bothe day and night,
And wayter pater-nails, and earthes
In general, so that ther shal bityde.
And whan that we hadde a certeyn day
Hem Clens were y-voyded, so wormes hede,
And many a mery hir salte teres speke,
Til day bigan to springe was accorde.
And in his herte somwhat he solas,
Til day bigan to springe grete mirours raines
In al thatl we can demen by our wit.
Tho bigan he day by his wonder-teres
The whyte sheep, that Wool in wizard was.)
Tho bigan he day by his sleighte
The whyte hare, and by the holy confort
The man, charles, and eek the halfway prince,
Conforten, and voyded hir together.
Thus started is this lineage;
Men may devyne and devyse al conseil.

And so bifel, that in the toun of Rome,
The statue of the mother of god,
With bowe in honde, as he had seyn,
And arwes in honured harnesses,
And eek hir lemman, and hir posse,
And alle hir rythes and hir fayres,
And how they shul seyn "allas!"
Now changes heer of ensamples many oon,
And many a newe markisesse,
That shapen with-outen any recche or stryf.
Thus quyte Iogenes, olde wronges,
Of ech estaat and rudenesse,
And of lichets, and of avoutryes,
Of regnes, of regnes duetees,
Of honurable wordes, and of charitablenesse,
Of markisesse, and of baudes po,
Of contractes, and of werkinges,
Of preestes, and of procuringes,
Of chirche-reves, and of chirche-pieces,
Of chirche-spe, and of chirche-trees,
Of contractes, and of werkinges,
And eek of other art of instrumentes,
And in what manere, by whiche instruments,
If any persone order, or any wight,
To doon a thing, or elles make another.
It is hard to seken of trouthe and gladnesse;
Boras, brydel, brydel-leves, and cote-armures;
Keminations, and kemes, and cloothes, and torets,
And eek the appendices and the foot-long stele;
The shap, ther- as is the shadwe of deeth;
The eres of mortels, and the eres of eels;
Poudres, and eek the foulnesse of it,
That dreynte cutteth cisouns
After gif, and rots, and furnishins;
toggle, and turnips, and cloothes of lye;
Cots, and dukes, and bateles ful of plate;
Reliks, and chiknes, and saveles,
And eek the guttes caroles, and the rectels,
That drenchen in the roof of the pyrie,
Into the moone, and the dores, and the spares,
And the mannes shoo, it is so lowe;
Noon hyer was the faile, ne noght the lowe.
Beth war, I pray yow hertely; for if she fare,
She shal wel knowe that right as she,
Sin that it is the beste reed,
She shal reherce wel half a gode ende.
And if so be the game wente aright,
She shal seye right sooth, what that herde is.
For wel she seith, right as she that misconceyveth,
She that misconceyveth som freend or two,
And cheats hir-self, she cI crye 'allas! that I was so longe ordained
To long apprenticeship in sinne and in meschance!'
Ther I was bred, taughte me that I be old
To sone rype, and bad my-self for to knete!
Yet hadde I never hard enough, certayn,
To garden in the morweninge,
For to goon to the nexte citee.
I have ther-with to done, I may nat longe tarie,
To garden up-on the grounde,
And maken of our covent firmes three.
And therfore, sire, trusteth right longe in my tonge,
That I shal doon, I wol rewe on Thursday.
And wel I woot, Sathanas, if I have it in my might,
I wol shal make hem good men to venge thee.'

Even at this level, some sentences are lifted directly from the source text - there are some samples from lower training levels that show greater novelty, but threw in some modern things (like, at 1000 steps, regular winky faces, presumably because of Chaucer's love of semicolons). All in all I think it's a convincing first pass at a NoGen, although I wish I'd been able to spend more time on it. If any Middle English heads happen by and have opinions on how well GPT-2 replicates it, please let me know!