cldf-clts / clts-legacy

Cross-Linguistic Transcription Systems
Apache License 2.0
4 stars 3 forks source link

fix handling and accessibility of metadata #28

Closed LinguList closed 6 years ago

LinguList commented 7 years ago

we have no real problems to get the normal sounds, but metadata should probably also be retrievable conveniently, maybe by loading upon request, etc.

LinguList commented 6 years ago

I imagine a call structure in the form of:

from pyclts.metadata import MetaData
from pyclts.clts import CLTS, translate

bipa = CLTS('bipa')
dolgo = MetaData('dolgopolsky')

translate('t o x t a', bipa, dolgo)

So the metadata, or at least parts of it, could have a similar call/get structure: str(dolgo.get(sound)) would return the Dolgopolsky sound class. And if we do this for Phoible, it would return the phoible character. The difference between metadata and transcription systems would then be that metadata is a fixed set of characters, while transcription systems can generate new characters from their diacritics in combination with the base sounds. Of course, metadata can be more, e.g., have an ID or a URL, but in most cases, we'd still assume that people assign a certain GRAPHEME to a given meta-datapoint that is related to sounds, so str(metadata.get(sound)) would basically behave similar in MetaData and CLTS.

Given these two basic data types: transcription systems, and data bases, one could even think of changing the names in the code: