As of now, I propose a rather radical relaunch, which, however, does not break the tests. We now distinguish
TS, as a transcription system, as we know it
TD, as transcriptiondata,
SC, as sound class systems (they are a mix between transcription-system and transcription-data, as they can generate unknown sounds)
What is important is that I also changed the way that we produce a transcription dataset. We start here by putting a file into sources, where we have to specify at least two columns: BIPA and GRAPHEME. BIPA serves for one important purpose: if we add a sound in BIPA, it means we explicitly say that the sound in the respective TD should be interpreted as such. E.g., if you check for sources/ruhlen.tsv, you find:
That is, because we explicitly linked the two sounds.
From now on, we can manually re-link data in sources, and different versions may have more sophisticated links. As we can already definitely define sounds in the transcription-systems, we can now also do so in the transcription data.
I also added a class "", which contains all the data that is not linked inside a given dataset. Similar to concepticon.
An open question is how to indicate the differences:
we have explicitly mapped a sound manually (č vs. tS in Ruhlen)
we have automatically mapped a sound and this sound is regularly occuring in our definitions of BIPA
we have automatically mapped a sound but this sound is not regularly occurring in BIPA, thus, it has the "generated" attribute set to "+"
I think we should distinguish these three levels, but I'm not yet sure how to do best.
As of now, I propose a rather radical relaunch, which, however, does not break the tests. We now distinguish
What is important is that I also changed the way that we produce a transcription dataset. We start here by putting a file into
sources
, where we have to specify at least two columns:BIPA
andGRAPHEME
. BIPA serves for one important purpose: if we add a sound in BIPA, it means we explicitly say that the sound in the respective TD should be interpreted as such. E.g., if you check forsources/ruhlen.tsv
, you find:That is, because we explicitly linked the two sounds.
From now on, we can manually re-link data in sources, and different versions may have more sophisticated links. As we can already definitely define sounds in the transcription-systems, we can now also do so in the transcription data.
I also added a class "", which contains all the data that is not linked inside a given dataset. Similar to concepticon.
An open question is how to indicate the differences:
I think we should distinguish these three levels, but I'm not yet sure how to do best.