cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
14 stars 3 forks source link

Grimes (1959) set of sounds and features #10

Open LinguList opened 7 years ago

LinguList commented 7 years ago

This should be metadata, I suppose, as Grimes (1959) lists features for only some 40 sounds relevant for Romance languages. He also describes a metric to define distances between sounds, and this is interesting. Ideally we should also add this data, but the question is: in which form? Distances call for a matrix, but csv is not really apt for matrices, due to the large number of column names. JSON could handle this. In any way, this will call for a custom script to create the data, based on features of Grimes, linkings, some description, a csv file with the sounds and our feature names, and the matrix in JSON. Question is again, how to do this in the most consistent way. We may decide for some json file with metadata that contains the additional information AND the table information for CSV, @xrotwang, is this possible in the current cldf spec?

LinguList commented 7 years ago

bildschirmfoto_2017-09-13_10-18-54

This is an example of the data... Feature-based distance calculation is based on getting the difference between vectors (using the integers as numerical values).

xrotwang commented 7 years ago

I'd say csv is still the best choice for a matrix. 40x40 would certainly not be too much.

LinguList commented 7 years ago

alright, as long as I don't need to specify column names in the metadata-json, this is okay, but I assume, that this can be handled, right?

xrotwang commented 7 years ago

yes, there's a default naming algo, and column types can also be declared for all columns at once.