digitallinguistics / data-format

The Data Format for Digital Linguistics (DaFoDiL)
https://format.digitallinguistics.io
MIT License
21 stars 0 forks source link

Utterance: add phonetic transcription #272

Closed dwhieb closed 5 years ago

dwhieb commented 5 years ago

The Utterance schema should have a string property called "phonetic", which users may use to provide a phonetic transcription of the utterance. This transcription must be in IPA; it may not have multiple orthographies. The user should not include phonetic brackets in the data.

dwhieb commented 5 years ago

I think it would be worthwhile to restrict the valid characters for this property to characters in the International Phonetic Alphabet, using a big regular expression and the JSON Schema pattern attribute. However, doing this would be a fairly large task - I think we should break it out into a separate issue if we do it. In fact, I think it would be a great resource if DLx had a small repo called IPA, which provided a complete list of valid IPA characters and their features. This repo might be nothing more than a JSON file.