Rhyme data is available for Shijing in Baxter's version. This should be added to CDDB, along with the script for conversion, so that it can be corrected and later on retrieved. It needs to further be decided how to represent the rhyme data consistently. Ideally this will be something similar to the structure we already had, but then another character data part, in which we list one character and additional annotations, such as:
reading OCBS
reading MCH
reading PWY (all readings we have available, also Starostin, and Schuessler)
Shijing rhyme occurrence
Karlgren group
Basically, it seems best to start from a character list (including ambiguities, so we'll need an ID), and to add information in a file-like manner based on csv, just as we did for other data. So all character readings get one file, and a meta-table can be automatically created.
We then have the following structure:
words / lexemes: reconstructions and dialect readings
characters: reconstructions and dialect readings
sources: Shijing (as our only current source, maybe also Guangyun)
varieties: dialects and older stages
metadata
The core idea would be: Old Chinese is treated as a language variety, and readings are linked to a source, as we do the same for modern dialects, where we have multiple sources. We further distinguish transcriptions, by distinguishing IPA (or CLPA) from the one given in the source. When adding fǎnqiè readings, we assign a fǎnqiè to a character, and also assign it a source (e.g., Guanyun). Meta-data, like mutual intelligibility scores needs to be modeled differently: it assembles information across different dialects, so we store it at the same point where we have trees. Or we could say: we have
Rhyme data is available for Shijing in Baxter's version. This should be added to CDDB, along with the script for conversion, so that it can be corrected and later on retrieved. It needs to further be decided how to represent the rhyme data consistently. Ideally this will be something similar to the structure we already had, but then another character data part, in which we list one character and additional annotations, such as:
Basically, it seems best to start from a character list (including ambiguities, so we'll need an ID), and to add information in a file-like manner based on csv, just as we did for other data. So all character readings get one file, and a meta-table can be automatically created.
We then have the following structure:
The core idea would be: Old Chinese is treated as a language variety, and readings are linked to a source, as we do the same for modern dialects, where we have multiple sources. We further distinguish transcriptions, by distinguishing IPA (or CLPA) from the one given in the source. When adding fǎnqiè readings, we assign a fǎnqiè to a character, and also assign it a source (e.g., Guanyun). Meta-data, like mutual intelligibility scores needs to be modeled differently: it assembles information across different dialects, so we store it at the same point where we have trees. Or we could say: we have
In this way, we can store also this information