NaNoGenMo / 2016

National Novel Generation Month, 2016 edition.
https://nanogenmo.github.io
162 stars 7 forks source link

A dictionary of an imaginary rhyming slang #49

Open enkiv2 opened 7 years ago

enkiv2 commented 7 years ago

Inspired by #41 , a dictionary of an imaginary rhyming slang that behaves a bit like cockney rhyming slang. The plan is to take the unix dictionary file, find all entries with multi-word rhymes in cmu's rhyming dictionary, add some entries that have gone through multiple iterations of this transformation (as an etymology), and sort the results by the rhyme rather than the original word, producing results in a dictionary-style format.

I'm not sure if this will produce enough entries to actually hit 50k words, but it's worth a try.

enkiv2 commented 7 years ago

A couple samples: https://github.com/enkiv2/misc/blob/master/nanogenmo-2016/rhyme-dict-sample.md https://github.com/enkiv2/misc/blob/master/nanogenmo-2016/rhyme-dict-sample2.md https://github.com/enkiv2/misc/blob/master/nanogenmo-2016/rhyme-dict-sample3.md

There are two problems with this project: using the cmu pronouncing dictionary to produce rhymes is super slow, and the cmu pronouncing dictionary contains relatively few multi-word phrases (most of which are either acronyms or proper names of some variety).

I've switched from using NLTK to using @aparrish's pronouncingpy module, which seems slightly faster & is a lot easier to use since it has its own rhyme-finding code (which is nicer than mine), but it still takes a day to generate output.