aparrish / pronouncingpy

A simple interface for the CMU pronouncing dictionary
BSD 3-Clause "New" or "Revised" License
302 stars 42 forks source link

[Question] How to associate pronunciation to senses/synsets/definitions #56

Open sevagh opened 4 years ago

sevagh commented 4 years ago

Hello - using this library (or the cmudict directly, same information), we can get multiple pronunciations. Some of these correspond to a different part of speech (e.g. PRO-ject noun vs. pro-JECT verb). Some are homographs with the same part of speech (e.g. bow).

Here's an example of bow:

>>> import pronouncing
>>> pronouncing.phones_for_word('bow')
['B AW1', 'B OW1']
>>>
>>> from nltk.corpus import wordnet
>>> [ss.definition() for ss in wordnet.synsets('bow')[:2]]
['a knot with two loops and loose ends; used to tie shoelaces', 'a slightly curved piece of resilient wood with taut horsehair strands; used in playing certain stringed instruments']

Does anybody have suggestions for how I could create relations from the pronunciations to the senses/synsets?

One potential path I'm looking at is:

  1. arpabet to IPA
  2. look up definitions/senses by IPA (I don't know where, just yet)