NaNoGenMo / 2018

National Novel Generation Month, 2018 edition.
https://nanogenmo.github.io/
112 stars 6 forks source link

Poetic botanicals field guide #43

Open rmalusa72 opened 5 years ago

rmalusa72 commented 5 years ago

I’m hoping to make a field guide of (some facts but mostly) generative poetry and prose on the subject of various flora. This is perhaps a generous definition of fiction, but it’s inspired by the sort of travel guide structure at work in, ex., the Annals of the Parrigues.

I have no idea yet what techniques I will use as I haven’t done a lot of research! But as soon as I start a repository I will add it here.

rmalusa72 commented 5 years ago

Repo here. So far all I've done is collect a bunch of resources from various NaNoGenMo threads, but I'm going to go look for interesting botanical stuff on Project Gutenberg to reference (field guides, botany books, guides to symbolism etc) and at some point hopefully make a mockup of how I want the pages to look?

rmalusa72 commented 5 years ago

I got google's pretrained word2vec model up and running to mess around with; I don't think I am likely to have the time, data or processing power to train my own, but I wonder if this one will be insufficiently informed about plants, as I don't know how frequently they come up in the google news dataset.

I'm playing around with this because I realized I have no idea any more what poetry is or how to write it. Bouncing around a meaning space in a way that produces a pleasing rhythm and sound??

enkiv2 commented 5 years ago

"What is poetry" is a pretty tough question to answer. I think most generated text is liable to be interpreted as though it were poetry regardless of intent, though.

On Sat, Nov 3, 2018 at 11:25 PM Ruby notifications@github.com wrote:

I got google's pretrained word2vec model up and running to mess around with; I don't think I am likely to have the time, data or processing power to train my own, but I wonder if this one will be insufficiently informed about plants, as I don't know how frequently they come up in the google news dataset.

I'm playing around with this because I realized I have no idea any more what poetry is or how to write it. Bouncing around a meaning space in a way that produces a pleasing rhythm and sound??

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2018/issues/43#issuecomment-435638931, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GdSY6pehhpoDneT5jCMm4d_N1TSlks5url4kgaJpZM4YMNz9 .

rmalusa72 commented 5 years ago

Oh yeah, I’m not interested in answering the question for human authors, I think it’s a huge fuzzy weird category; I guess what I really mean is “what is something plausible for me to generate that also satisfies my mysterious implicit internal idea of what I wanted to make in the first place, which was less well-defined than I thought it was.” :P

That is a good point about interpretation! I will keep it in mind.

rmalusa72 commented 5 years ago

I’ve been thinking about it, and I think the part of (a certain kind of) poetry I am most interested in for this project is meter and attention to patterns of sound. To retain some appearance of sense, I might start with existing phrases and sentences, and warp them by synonym replacement and frankensteining bits of them together, with the goal being movement towards a passage with more organized rhythm and sound.

I don’t know if someone has made some metric of degree of internal alliteration/assonance/etc that reflects what I’m thinking of, but it shouldn’t be TOO hard to build my own if I can get a handle on what I actually want to measure.

enkiv2 commented 5 years ago

I think pronouncingpy might be what you're looking for. The author is a poet & has done some interesting experiments with it. If I recall, it provides both sound similarity information & syllable breaks -- so it should be able to identify alliteration, rhymes, and meter.

On Mon, Nov 5, 2018 at 12:32 PM Ruby notifications@github.com wrote:

I’ve been thinking about it, and I think the part of (a certain kind of) poetry I am most interested in for this project is meter and attention to patterns of sound. To retain some appearance of sense, I might start with existing phrases and sentences, and warp them by synonym replacement and frankensteining bits of them together, with the goal being movement towards a passage with more organized rhythm and sound.

I don’t know if someone has made some metric of degree of internal alliteration/assonance/etc that reflects what I’m thinking of, but it shouldn’t be TOO hard to build my own if I can get a handle on what I actually want to measure.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2018/issues/43#issuecomment-435963044, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GUhDSMDuzb-DKlfVmjxzvJlJE_Ijks5usHZEgaJpZM4YMNz9 .

rmalusa72 commented 5 years ago

Ah, this looks super useful, thanks!

rmalusa72 commented 5 years ago

On examination, it appears that pronouncingpy doesn't include similarity as far as I can tell, but that the author's work with phonetic similarity vectors might be exactly what I was describing.

I'm going to work on the non-poetry-making bit a little (title and page formatting, the hopefully-factual bit of each entry, gathering some interesting source text for a given plant, etc) just so I can have something concrete to point to.

rmalusa72 commented 5 years ago

Ok, I'm getting .. somewhere? I don't know if it's an interesting somewhere, but it's fun. I've used phonetic-similarity-vectors and pronouncing to build a couple metrics for selecting the next word in the poem, from a list of synonyms of the next word in the source text, based on a weighted combination of metric conformity and phonetic similarity.

Here's the result on the first paragraph of the wikipedia page for apples, with the ratings for metric conformity and sound similarity weighted 9x and 1x respectively (line breaks added by me for readability):

Apple trees are large if adult against egg
 altogether jade cultivars are bear by grafting 
toward source which weight tense range of 
the flow shrub near are else than seven 
thousand five centuplicate known cultivars
 of earth end in a play of aim thing contrasting
 cultivars are educated as unlike kick also call
 plus brewing chewing green along with cider 
staging seedling furthermore grain are prone
 through a emblem of fungal bacterial moreover
 bother box which bucket breathe restrained 
past a sign of live furthermore non live route
 hot diploid millenarian and decagonal the "fruits"
 genome move sequenced as side of R and D on 
ache check and fussy rearing trig jade staging

And here's with 1:9:

Apple trees are large if adult against egg
 altogether jade cultivars are bear by grafting
 in contact with rootstocks which weight tense
 content of the flow shrub available are else 
than seven millenarian five hundred patent 
cultivars of earth occur in a area of covet 
affection colorful cultivars are educated 
because assorted salt also call counting 
cooking binging organic including cider 
staging seedling including produce are 
prostrate into a emblem of fungal bacterial
 as well as pest mess which bucket abide 
inhibited handy a emblem of live moreover
 non anatomical channel natty two millenarian
 and decagonal the "fruits" genome move
 sequenced as side of inquisition adjacent
 contagion containment and eclectic gentility natty pea assembly

I like the second one better - "covet affection colorful cultivars" is quite nice, as is "call counting cooking binging organic ... staging seedling including produce are prostrate." Interestingly, if I weight the two metrics equally, I just get.... the first one again - I suspect this is because the metric conformity score is a pretty forgiving ratio of "hits" and "misses" and will tend to be higher than the cosine similarity of two random words? 🤔

rmalusa72 commented 5 years ago

Some nice phrases as I drag more wiki text in for source material:

“bright grain scrawl”
“depleted antiquity / antiquated antiquity ecru”
“do such awful apprize”
“with the sum of diversification / mature is desist solicitation”
“both brick and cassia issue chic Arabia”

One extremely very bad one:

“savoury bowl of funk and chump swank comic”
rmalusa72 commented 5 years ago

Back to work after the holiday week! I've spent the evening babysitting the poetry-writing process. It's a bit slow, even working remotely on a machine with more processing power; fair, I didn't write it to be particularly efficient, and going through all the synonyms for a word and scoring them might take a while. (If I want to change that, I might pick a score threshold and use the first synonym that passes it, instead of going through every synonym and picking the best.)

At this point, it's almost run through every item in the lists of plants from pycorpora, but it's done so in three separate runs because I encountered a problem with disambiguation pages and resumed from the last stopping point after I fixed it (twice, more thoroughly the second time). I don't want to just stitch those together, though, and while I saved the output I forgot to save the wikipedia-pages-used bibliography, so. I'll run it again either overnight or tomorrow.

I wanted to do more formatting, but life has been hectic here so I want to get a full output done and submitted, in case I don't have time later.

In conclusion, here are some nice entries:

Black Hellebore

Commonly known as hellebores , the Eurasian genus Helleborus consists of approximately 20 species of herbaceous or evergreen perennial flowering plants in the family Ranunculaceae, within which it gave its name to the tribe of Helleboreae.

Despite names such as cold pink Christmas rose 
moreover Lenten rose hellebores 
are not hard intertwined into affecting 
pink folk Rosaceae gray brick depression 
finished sensational disintegrating 
range such as impressive embryo petal 
and herb member of the shrub may mummy 
and conk back baring directed toward 
the fur although gather ovum from hellebore 
and Eric Smith In recent senescence 
Ashwood preschool of Kingswinford in affecting 
are open as expressive destructive 
distinction of raven effective vine 
life lure routinely convert verdant inward 
and absent especial cluster often 
roost next spectacular sprout now a era 
or 

Blue Ash

Fraxinus quadrangulata, the blue ash, is a species of ash native primarily to the Midwestern United States from Oklahoma to Michigan, as well as the Bluegrass region of Kentucky and the Nashville Basin region of Tennessee.

drew yours style from sudden indigo powder 
timber contemporary tense range tense 
timber of which were worn toward figure 
frequent of the community's primitive 
fabric cold American native relics 
topiary group are worn away northward 
American frogs as a fussy sustenance 
connection at the time that the scale a 
well known fall from sudden shrub through secure 
floors baseball bats bed gadget knob box as 
well as cask climactic burghal of azure 
embers expanse expressive length of which 
were worn toward figure frequent of the 
community's primitive fabric cold 
American in pool both overnight 
moreover forever full embracing 
a acme of decagonal 

Butterfly Weed

Asclepias tuberosa is a species of milkweed native to eastern North America.

It is a old slip swelling toward cipher 
tripartite precise time precise ft nil 
in third ft paintbrush are spirally arranged 
lanceolate pentagonal twelve cm 
long furthermore bifurcate third cm broad 
that plant present bare brim preferentially 
gravel grime but compass plus been believed 
forth flood frame pull miss awash sol charm is 
as a rule known as ant hemp since of the 
jumpy such are charm to the shrub through yours 
hue moreover hers wordy return character 
Ontario and Quebec Asclepias tuberosa 
the regent and potentate stiff Hummingbirds 
bees along with other vermin are again 
engage 

Coast Live Oak

Quercus agrifolia, the California live oak or coast live oak, is a highly variable, often shrubby evergreen oak tree, a type of live oak, native to the California Floristic Province.

It grows west of the mesa Nevada 
bluff province from Mendocino division 
California south to polar Baja 
California in Mexico It is 
secret chic sensational titian site 
slice of oaks expressive warmth procure from 
solar collar cool stalk are largely large 
and underweight take unique a odd flag 
of photosynthetic cells tense bent blade 
build may be meet since remote stalk which depend 
beside catch reflected lustrous distributed 
trig stray steer from sudden through unit rounded 
or immortal tributary interest 
be allowed be begin trig scarce characteristic 
district along with coast aware oak forest 
Engelmann oak forest trough 
rmalusa72 commented 5 years ago

Ok, I've done it! I hugely overshot the word count I was aiming for — I didn't consider that the section titles and descriptions would add words, lol. This version has about 90k words.

Here's the raw markdown: https://raw.githubusercontent.com/rmalusa72/poetic_botanicals/master/example/full_output1.txt And a list of the wikipedia pages used as reference, by entry: https://github.com/rmalusa72/poetic_botanicals/blob/master/example/full_output1_pagesused.txt (There are some duplicates in there - I used the "plants" AND "flowers" lists from pycorpora and didn't check for overlap.)

I used multimarkdown to convert that markdown into HTML and from there converted it to a PDF, which is here: https://github.com/rmalusa72/poetic_botanicals/blob/master/example/full_output1.pdf I'm working on making a title page/adding an end note and the pages used to the PDF.

Going forward with this project... man, I would love to make an interface where the user can tweak the parameters (weight on various scores, line length, likelihood of jumping around in the source text) on the fly. But that's a matter for a different comment thread. :>

hugovk commented 5 years ago

Well done!

(Tip: if you name full_output1.txt as full_output1.md, GitHub will render the Markdown formatting.)