cldf / cldf

CLDF: Cross-Linguistic Data Formats - the specification
https://cldf.clld.org
Apache License 2.0
51 stars 17 forks source link

Add a ParameterNetwork component #140

Closed xrotwang closed 6 months ago

xrotwang commented 1 year ago

While languages are often arranged in trees - a datatype with Newick as quasi default format, parameters (in particular the concepts of Wordlists) are often arranged in (semantic) networks for further analysis (e.g. in CLICS). (For StructureDatasets, such a network or graph could be useful to describe dependencies (or hierarchies) among features.)

Thus, CLDF could add a ParameterNetwork component, basically a table listing edges with - two parameter reference columns source and target, specifying the nodes

Examples:

xrotwang commented 1 year ago

One of the questions to answer here is whether the genericity of the proposal here over e.g. a more targeted "ColexificationTable" is useful. Basically: Is it more likely people end up shoehorning other networks into a ColexificationTable or more likely that colexifications in a ParameterNetworkTable are not recognizable enough?

LinguList commented 1 year ago

There are other projects on co-expression, and they would probably be better placed into a parameternetworktable. So if propagated well enough, this should work perfectly with the proposed level of abstraction.

We might call it "ParameterGraph", though?

xrotwang commented 1 year ago

We might call it "ParameterGraph", though?

Not sure. Graph seems to be used more for the "pure" data structure while network denotes the "object to be analysed"? If so, I'd prefer network because CLDF is all about "connecting data with analysis".

SimonGreenhill commented 1 year ago

Great idea. Lots of data could be handled this way.

I think generic is better than specific, as the specifics can be dealt with at the application level.

Little bit worried about saving in a particular file format, but we do this for trees anyway (nexus/newick). However, could we just store this type of data as an extended ValueTable e.g.

node1,node2,attribute,value
LinguList commented 1 year ago

In fact, that is the idea, @SimonGreenhill, if you compare what I did for CLICS4: https://github.com/clics/clics4, just imagine that we make a proper parametergraph out of the current structuredataset. GML is just an addon, no requirement.

xrotwang commented 7 months ago

See related discussion here: https://github.com/concepticon/concepticon-data/issues/1338

xrotwang commented 7 months ago

In terms of naming, I'd stick with ParameterNetwork, because Graph might sound too exclusive - given that we also accept directed graphs, mixed graphs, etc.

In the Concepticon use case, we'd also have multiple networks in the ParameterNetwork component: Different ones for each Concepticon concept relation, some derived from networks in concept lists. These would probably be disambiguated by a contributionReference.

LinguList commented 7 months ago

Yes, that seems important.