digling / cddb

Chinese Dialect Database
GNU General Public License v3.0
16 stars 4 forks source link

Add Rhyme data to CDDB and general organisation of CDDB #2

Open LinguList opened 8 years ago

LinguList commented 8 years ago

Rhyme data is available for Shijing in Baxter's version. This should be added to CDDB, along with the script for conversion, so that it can be corrected and later on retrieved. It needs to further be decided how to represent the rhyme data consistently. Ideally this will be something similar to the structure we already had, but then another character data part, in which we list one character and additional annotations, such as:

Basically, it seems best to start from a character list (including ambiguities, so we'll need an ID), and to add information in a file-like manner based on csv, just as we did for other data. So all character readings get one file, and a meta-table can be automatically created.

We then have the following structure:

The core idea would be: Old Chinese is treated as a language variety, and readings are linked to a source, as we do the same for modern dialects, where we have multiple sources. We further distinguish transcriptions, by distinguishing IPA (or CLPA) from the one given in the source. When adding fǎnqiè readings, we assign a fǎnqiè to a character, and also assign it a source (e.g., Guanyun). Meta-data, like mutual intelligibility scores needs to be modeled differently: it assembles information across different dialects, so we store it at the same point where we have trees. Or we could say: we have

  1. distances
  2. phylogenies

In this way, we can store also this information