cldf-clts / clts-legacy

Cross-Linguistic Transcription Systems
Apache License 2.0
4 stars 3 forks source link

pyclts: Enable re-use of characters #128

Closed Anaphory closed 5 years ago

Anaphory commented 5 years ago

For Sounds, both transcription systems (bipa=pyclts.TranscriptionSystem('bipa')) and sound class models allow the equivalent of

bipa[bipa['e']]

but this breaks for other symbols. Changing this requires (a) exporting Symbol, and replacing Sound with Symbol in

https://github.com/cldf/clts/blob/e65d4aabf6d1b739e7d1f203564dbb66571ddf22/src/pyclts/transcriptionsystem.py#L279

and in the equivalent place in soundclasses.py.

Is there any reason not to do this?

LinguList commented 5 years ago

I wonder why this would be needed? Sorry, it's too early in the morning here, so I have problems seeing the point.

xrotwang commented 5 years ago

Hm. If this doesn't lead to anything weird with UnknownSound, I think there's no particular reason for this limitation.

Anaphory commented 5 years ago

My use case for this is writing code that is ignorant about the data type it gets. The code should work on sequences of strings as well as on the sequences of Symbols that fall out of bipa(segment_string).

If the segment string contains unknown segments or even just markers, I get

>>> asjp[bipa['+']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/vol/home/kaipingga/devel/clts/src/pyclts/models.py", line 47, in __getitem__
    return self.resolve_sound(sound)
  File "/vol/home/kaipingga/devel/clts/src/pyclts/soundclasses.py", line 46, in resolve_sound
    sound = sound if isinstance(sound, Sound) else self.system[sound]
  File "/vol/home/kaipingga/devel/clts/src/pyclts/models.py", line 47, in __getitem__
    return self.resolve_sound(sound)
  File "/vol/home/kaipingga/devel/clts/src/pyclts/transcriptionsystem.py", line 278, in resolve_sound
    if set(string.split(' ')).intersection(set(list(self.sound_classes)+['diphthong',
AttributeError: 'Marker' object has no attribute 'split'

where the desired behaviour would be to output the same as for

>>> asjp['+']
'+'
LinguList commented 5 years ago

okay, that looks convincing enough. Can you propose a PR directly on that?

Anaphory commented 5 years ago

Yep, about to.