Closed LinguList closed 6 years ago
The easiest way to produce this data seems to me to add a dump()
command to the commands of clts, which would create the relevant datatables in csv-format (ideally with proper csvw-metadata). To keep links, we would further need to add identifiers among the values, so:
I think, I can easily start to draft the dump()
command that keeps these inter-relationships in the data and also tests for consistency.
Okay, thanks to the experiments with clld, I think I see the model a bit clearer now:
All in all, this seems to be the best way to proceed. The values/
will be quite a few, as we have some 30 000 unique values for all data types together, but this view is not the most important one.
The code needs to be adjusted accordingly (the __main__.py.dump
-command currently separates the values into three classes).
we're advancing on this, so we can close the issue for the time being.
I was giving some time to understand how CLLD works today, and I succeded at rendering some basic aspects, but I realized that at some points, it will just be faster if I learn directly from @xrotwang on how to do it, and that we need to decide on the basic underlying model, as well as the different views we want to have on the data.
Here's a really rough account on my current thinking:
clts/parameters
to answer to ourbipa[sound].name
-schema, so that people can link to the resourceTranscriptionData
, so each release should give a table with all agglomerated sounds interpreted correctly (according to our evaluation principles) with BIPAThe question is: how to proceed? This somehow would also be amenable to csvw, as we could create csv-files with json metadata explaining the type of dataset, so that it eases deployment in CLLD. The parameters-view should be minimal, I guess, showing only:
I would call this view the "Sounds" or "Sound Segments".
Then we could list all "Graphemes" as values, where we show (this is licensed by the linguistic definition of grapheme which allows for a grapheme to contain multiple letters see here):
Note that the mapping between "grapheme" and "name" per datset is n to 1: each grapheme has one name, but the same name may have several graphemes in a given dataset.
Then, we would have the contributions, similar to concepticon, where we list what sources helped to create a given TS or TD. Here, I can think of the following items:
The value for a given Grapheme in a given dataset would list the additional data, like frequency of representation, URL, features, etc.