Open rmalusa72 opened 6 years ago
Repo here. So far all I've done is collect a bunch of resources from various NaNoGenMo threads, but I'm going to go look for interesting botanical stuff on Project Gutenberg to reference (field guides, botany books, guides to symbolism etc) and at some point hopefully make a mockup of how I want the pages to look?
I got google's pretrained word2vec model up and running to mess around with; I don't think I am likely to have the time, data or processing power to train my own, but I wonder if this one will be insufficiently informed about plants, as I don't know how frequently they come up in the google news dataset.
I'm playing around with this because I realized I have no idea any more what poetry is or how to write it. Bouncing around a meaning space in a way that produces a pleasing rhythm and sound??
"What is poetry" is a pretty tough question to answer. I think most generated text is liable to be interpreted as though it were poetry regardless of intent, though.
On Sat, Nov 3, 2018 at 11:25 PM Ruby notifications@github.com wrote:
I got google's pretrained word2vec model up and running to mess around with; I don't think I am likely to have the time, data or processing power to train my own, but I wonder if this one will be insufficiently informed about plants, as I don't know how frequently they come up in the google news dataset.
I'm playing around with this because I realized I have no idea any more what poetry is or how to write it. Bouncing around a meaning space in a way that produces a pleasing rhythm and sound??
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2018/issues/43#issuecomment-435638931, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GdSY6pehhpoDneT5jCMm4d_N1TSlks5url4kgaJpZM4YMNz9 .
Oh yeah, I’m not interested in answering the question for human authors, I think it’s a huge fuzzy weird category; I guess what I really mean is “what is something plausible for me to generate that also satisfies my mysterious implicit internal idea of what I wanted to make in the first place, which was less well-defined than I thought it was.” :P
That is a good point about interpretation! I will keep it in mind.
I’ve been thinking about it, and I think the part of (a certain kind of) poetry I am most interested in for this project is meter and attention to patterns of sound. To retain some appearance of sense, I might start with existing phrases and sentences, and warp them by synonym replacement and frankensteining bits of them together, with the goal being movement towards a passage with more organized rhythm and sound.
I don’t know if someone has made some metric of degree of internal alliteration/assonance/etc that reflects what I’m thinking of, but it shouldn’t be TOO hard to build my own if I can get a handle on what I actually want to measure.
I think pronouncingpy might be what you're looking for. The author is a poet & has done some interesting experiments with it. If I recall, it provides both sound similarity information & syllable breaks -- so it should be able to identify alliteration, rhymes, and meter.
On Mon, Nov 5, 2018 at 12:32 PM Ruby notifications@github.com wrote:
I’ve been thinking about it, and I think the part of (a certain kind of) poetry I am most interested in for this project is meter and attention to patterns of sound. To retain some appearance of sense, I might start with existing phrases and sentences, and warp them by synonym replacement and frankensteining bits of them together, with the goal being movement towards a passage with more organized rhythm and sound.
I don’t know if someone has made some metric of degree of internal alliteration/assonance/etc that reflects what I’m thinking of, but it shouldn’t be TOO hard to build my own if I can get a handle on what I actually want to measure.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2018/issues/43#issuecomment-435963044, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GUhDSMDuzb-DKlfVmjxzvJlJE_Ijks5usHZEgaJpZM4YMNz9 .
Ah, this looks super useful, thanks!
On examination, it appears that pronouncingpy doesn't include similarity as far as I can tell, but that the author's work with phonetic similarity vectors might be exactly what I was describing.
I'm going to work on the non-poetry-making bit a little (title and page formatting, the hopefully-factual bit of each entry, gathering some interesting source text for a given plant, etc) just so I can have something concrete to point to.
Ok, I'm getting .. somewhere? I don't know if it's an interesting somewhere, but it's fun. I've used phonetic-similarity-vectors and pronouncing to build a couple metrics for selecting the next word in the poem, from a list of synonyms of the next word in the source text, based on a weighted combination of metric conformity and phonetic similarity.
Here's the result on the first paragraph of the wikipedia page for apples, with the ratings for metric conformity and sound similarity weighted 9x and 1x respectively (line breaks added by me for readability):
Apple trees are large if adult against egg
altogether jade cultivars are bear by grafting
toward source which weight tense range of
the flow shrub near are else than seven
thousand five centuplicate known cultivars
of earth end in a play of aim thing contrasting
cultivars are educated as unlike kick also call
plus brewing chewing green along with cider
staging seedling furthermore grain are prone
through a emblem of fungal bacterial moreover
bother box which bucket breathe restrained
past a sign of live furthermore non live route
hot diploid millenarian and decagonal the "fruits"
genome move sequenced as side of R and D on
ache check and fussy rearing trig jade staging
And here's with 1:9:
Apple trees are large if adult against egg
altogether jade cultivars are bear by grafting
in contact with rootstocks which weight tense
content of the flow shrub available are else
than seven millenarian five hundred patent
cultivars of earth occur in a area of covet
affection colorful cultivars are educated
because assorted salt also call counting
cooking binging organic including cider
staging seedling including produce are
prostrate into a emblem of fungal bacterial
as well as pest mess which bucket abide
inhibited handy a emblem of live moreover
non anatomical channel natty two millenarian
and decagonal the "fruits" genome move
sequenced as side of inquisition adjacent
contagion containment and eclectic gentility natty pea assembly
I like the second one better - "covet affection colorful cultivars" is quite nice, as is "call counting cooking binging organic ... staging seedling including produce are prostrate." Interestingly, if I weight the two metrics equally, I just get.... the first one again - I suspect this is because the metric conformity score is a pretty forgiving ratio of "hits" and "misses" and will tend to be higher than the cosine similarity of two random words? 🤔
Some nice phrases as I drag more wiki text in for source material:
“bright grain scrawl”
“depleted antiquity / antiquated antiquity ecru”
“do such awful apprize”
“with the sum of diversification / mature is desist solicitation”
“both brick and cassia issue chic Arabia”
One extremely very bad one:
“savoury bowl of funk and chump swank comic”
Back to work after the holiday week! I've spent the evening babysitting the poetry-writing process. It's a bit slow, even working remotely on a machine with more processing power; fair, I didn't write it to be particularly efficient, and going through all the synonyms for a word and scoring them might take a while. (If I want to change that, I might pick a score threshold and use the first synonym that passes it, instead of going through every synonym and picking the best.)
At this point, it's almost run through every item in the lists of plants from pycorpora, but it's done so in three separate runs because I encountered a problem with disambiguation pages and resumed from the last stopping point after I fixed it (twice, more thoroughly the second time). I don't want to just stitch those together, though, and while I saved the output I forgot to save the wikipedia-pages-used bibliography, so. I'll run it again either overnight or tomorrow.
I wanted to do more formatting, but life has been hectic here so I want to get a full output done and submitted, in case I don't have time later.
In conclusion, here are some nice entries:
Despite names such as cold pink Christmas rose
moreover Lenten rose hellebores
are not hard intertwined into affecting
pink folk Rosaceae gray brick depression
finished sensational disintegrating
range such as impressive embryo petal
and herb member of the shrub may mummy
and conk back baring directed toward
the fur although gather ovum from hellebore
and Eric Smith In recent senescence
Ashwood preschool of Kingswinford in affecting
are open as expressive destructive
distinction of raven effective vine
life lure routinely convert verdant inward
and absent especial cluster often
roost next spectacular sprout now a era
or
drew yours style from sudden indigo powder
timber contemporary tense range tense
timber of which were worn toward figure
frequent of the community's primitive
fabric cold American native relics
topiary group are worn away northward
American frogs as a fussy sustenance
connection at the time that the scale a
well known fall from sudden shrub through secure
floors baseball bats bed gadget knob box as
well as cask climactic burghal of azure
embers expanse expressive length of which
were worn toward figure frequent of the
community's primitive fabric cold
American in pool both overnight
moreover forever full embracing
a acme of decagonal
It is a old slip swelling toward cipher
tripartite precise time precise ft nil
in third ft paintbrush are spirally arranged
lanceolate pentagonal twelve cm
long furthermore bifurcate third cm broad
that plant present bare brim preferentially
gravel grime but compass plus been believed
forth flood frame pull miss awash sol charm is
as a rule known as ant hemp since of the
jumpy such are charm to the shrub through yours
hue moreover hers wordy return character
Ontario and Quebec Asclepias tuberosa
the regent and potentate stiff Hummingbirds
bees along with other vermin are again
engage
It grows west of the mesa Nevada
bluff province from Mendocino division
California south to polar Baja
California in Mexico It is
secret chic sensational titian site
slice of oaks expressive warmth procure from
solar collar cool stalk are largely large
and underweight take unique a odd flag
of photosynthetic cells tense bent blade
build may be meet since remote stalk which depend
beside catch reflected lustrous distributed
trig stray steer from sudden through unit rounded
or immortal tributary interest
be allowed be begin trig scarce characteristic
district along with coast aware oak forest
Engelmann oak forest trough
Ok, I've done it! I hugely overshot the word count I was aiming for — I didn't consider that the section titles and descriptions would add words, lol. This version has about 90k words.
Here's the raw markdown: https://raw.githubusercontent.com/rmalusa72/poetic_botanicals/master/example/full_output1.txt And a list of the wikipedia pages used as reference, by entry: https://github.com/rmalusa72/poetic_botanicals/blob/master/example/full_output1_pagesused.txt (There are some duplicates in there - I used the "plants" AND "flowers" lists from pycorpora and didn't check for overlap.)
I used multimarkdown to convert that markdown into HTML and from there converted it to a PDF, which is here: https://github.com/rmalusa72/poetic_botanicals/blob/master/example/full_output1.pdf I'm working on making a title page/adding an end note and the pages used to the PDF.
Going forward with this project... man, I would love to make an interface where the user can tweak the parameters (weight on various scores, line length, likelihood of jumping around in the source text) on the fly. But that's a matter for a different comment thread. :>
Well done!
(Tip: if you name full_output1.txt
as full_output1.md
, GitHub will render the Markdown formatting.)
I’m hoping to make a field guide of (some facts but mostly) generative poetry and prose on the subject of various flora. This is perhaps a generous definition of fiction, but it’s inspired by the sort of travel guide structure at work in, ex., the Annals of the Parrigues.
I have no idea yet what techniques I will use as I haven’t done a lot of research! But as soon as I start a repository I will add it here.