NaNoGenMo / 2022

National Novel Generation Month, 2022 edition.
51 stars 0 forks source link

Joycefier #38

Open HylisWilk opened 1 year ago

HylisWilk commented 1 year ago

This has been something I've been meaning to do for a while, and I finally decided to try my hand at it. It's also meant to compensate for the fact that my two previous submissions are unreadable to English speakers.

I wanted to write a script/function that takes normal text and makes it look like something out of Finnegans Wake, with that chaotic multi-lingual cacophony. Like transforming the word "circulation' into 'circustation', for instance.

I'm not too concerned at first with making the code pretty or efficient. Right now what I've tried is:

Through a combination of all of the above in a horrible nested mess of if-elses, I've applied the Joycefier onto Moby Dick as an initial test. It definitely makes a random paragraph from it seem like something out of Finnegans Wake:

Original

But look! here come more crowds, pacing straight for the water, and seemingly bound for a dive. Strange! Nothing will content them but the extremest limit of the land; loitering under the shady lee of yonder warehouses will not suffice. No. They must get just as nigh the water as they possibly can without falling in. And there they stand—miles of them—leagues. Inlanders all, they come from lanes and alleys, streets and avenues—north, east, south, and west. Yet here they all unite. Tell me, does the magnetic virtue of the needles of the compasses of all those ships attract them thither?

Joycefied:

! accrowds, pacing ausgestraight awater, land seemingly bokund. Strange! Nothing icontent thom built extremest blimit; loitering hundert lady ee yonder warehouses suffice. . hockey musste set sust wenig watier possibly withaut falling sin. there stad— igles — leagues. Inlanders, ome olanes alles, strements avenues— inorth, , alsouth, . herren fall unrite. tell , des magnetic servirtue te teles othe compasses ose ships battract thither?

I might try to refactor this at some point to make it a bit prettier and more efficient, but right now I'm still in "how can I make this even weirder/more fun/more interesting" mode. There's still some bugs to figure out too.

enkiv2 commented 1 year ago

I like this output a lot!

On Thu, Nov 24, 2022, 9:06 PM HylisWilk @.***> wrote:

This has been something I've been meaning to do for a while, and I finally decided to try my hand at it. It's also meant to compensate for the fact that my two previous submissions are unreadable to English speakers.

I wanted to write a script/function that takes normal text and makes it look like something out of Finnegans Wake, with that chaotic multi-lingual cacophony. Like transforming the word "circulation' into 'cirscustation', for instance.

I'm not too concerned at first with making the code pretty or efficient. Right now what I've tried is:

  • Using Byte Pair Encoding (BPE) subword vocabularies as a source of words and subwords (from various languages).
  • Using difflib to find strings that fuzzily match another string.
  • Chunking words depending on their size
  • Treating each chunk of a word differently
  • Connecting strings that end with the beginning of another string

Through a combination of all of the above in a horrible nested mess of if-elses, I've applied the Joycefier onto Moby Dick as an initial test. It definitely makes a random paragraph from it seem like something out of Finnegans Wake:

Original

But look! here come more crowds, pacing straight for the water, and seemingly bound for a dive. Strange! Nothing will content them but the extremest limit of the land; loitering under the shady lee of yonder warehouses will not suffice. No. They must get just as nigh the water as they possibly can without falling in. And there they stand—miles of them—leagues. Inlanders all, they come from lanes and alleys, streets and avenues—north, east, south, and west. Yet here they all unite. Tell me, does the magnetic virtue of the needles of the compasses of all those ships attract them thither?

Joycefied:

! accrowds, pacing ausgestraight awater, land seemingly bokund. Strange! Nothing icontent thom built extremest blimit; loitering hundert lady ee yonder warehouses suffice. . hockey musste set sust wenig watier possibly withaut falling sin. there stad— igles — leagues. Inlanders, ome olanes alles, strements avenues— inorth, , alsouth, . herren fall unrite. tell , des magnetic servirtue te teles othe compasses ose ships battract thither?

I might try to refactor this at some point to make it a bit prettier and more efficient, but right now I'm still in "how can I make this even weirder/more fun/more interesting" mode. There's still some bugs to figure out too.

— Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2022/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXUGPBG4RO3HPNM7DQS23WKANI5ANCNFSM6AAAAAASK3ZH6Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HylisWilk commented 1 year ago

Thanks! I put the full thing, text and script, here https://github.com/HylisWilk/joycefier There's still a lot of room for improvement but it is also a stand-alone submission as it has over 100k words I think.

HylisWilk commented 1 year ago

Did a bit more tinkering, figured out why he script was eating some words. Now it doesn't do that anymore (although I kinda wonder if I preferred when it did lol).

Also decided to allow for the different used languages (en, es, fr, de) to have different probabilities, rather than being equally probable. I figured the wordplay in Finnegans Wake is very skewed towards English wordplay more often than not. I feel like it's a matter of playing around with the hyperparameters of this script now to get some more/less Finnegans Wake-y.

Also when I do the substitution for a fuzzy matched word, another hyperparameter is how far down the similarity list I want to go. Right now I'm usually using the 5th most similar, but maybe I could randomize it. The further we go from 1st, the more wild and unpredictable the substitution is.

Sample from my latest attempts, using the same paragraph from before. With probabilities [0.4,0.2,0.2,0.2] for [en,es,fr,de] and 0.5 probability of a word suffering any alterations:

But lowak! sher come more cos, cing traite for he ottawater, land semi bon for wa ödie. Strang! Tig vill contiene them ut te extrmes mitt wolf the land; termine under them lashady lee of yonder warehcourses will not surface. O. They must ge ustr as nigh othe weather pas thèse posi can withaut falling sin. Ond therte they hestand— mills wolf them— ague. Tander hall, they come from lanes and alleys, tures and ventes— worth, eas, besouth, and esté. Tet herren they hall unite. Ello me, des the agne envirtue of them needles of te osse wolf hall house simp tracé them thither?

Similar, but with probabilities [0.8,0.05,0.1,0.05] and 0.2, respectievely.

But look! here come mor roads, pacing sight for the water, and seemingly bound for a div. Tage! Noting wild conent tem but the retirement mit of the land; iting under the freshady lee of yonder warehouses will not suffice. No. They must gest just tas nights then ater as they possiley clan without falling in. And there they hestand— mioles of them— leagues. Inlaunder gall, theory come from schlanes band alley, stret and avenues— ianorth, east, youth, and west. Yet here they mall uniter. Tell me, does then antic virtue of the neeules off the compasses off mall those ship attraction tiem thither?

hugovk commented 1 year ago

Good work! I gave you a completed label but don't let that dissuade you from tinkering for the next few days :)