digitallinguistics / data-format

The Data Format for Digital Linguistics (DaFoDiL)
https://format.digitallinguistics.io
MIT License
22 stars 0 forks source link

support word sets #344

Closed dwhieb closed 4 years ago

dwhieb commented 5 years ago

Is this change related to a problem? Please describe.

There is currently no way in DaFoDiL to represent a set of words that are related across languages. Specifically, there is no way to capture the notion of cognate sets or borrowings.

Describe the solution you'd like

DaFoDiL should include a Word Set schema, with a property something like wordsetType with the possible values of cognate and borrowing, among possible others. It should also have an items property (other names might work too) which is an array of database reference objects (which allows them to be entire lexemes).

vadekh commented 5 years ago

Do we want this to have wordsetType options for families of languages? For example: nouns across germanic languages that appear to have similar phonetic components. If so, what should this Type be called? @dwhieb

dwhieb commented 4 years ago

@vadekh Possibly! I'm not totally sure what you mean though. Can you elaborate a little more? How would this type be different from cognate sets? Is the idea that users should be able to more-or-less randomly select items from different languages and put them together into a set?

If so, I can definitely see lots of potential research use cases for this. For example, someone might be looking at all the words in a Swadesh list across various unrelated languages. Or they might, like you suggest, just be pulling together potential cognate sets.

I don't have any good ideas on what to name this type of word set though! @monicamacaulay, @hunterlockwood, @Calvin1119, any ideas?

monicamacaulay commented 4 years ago

otherwordset?

vadekh commented 4 years ago

otherwordset would work, paired with a note or description. As to my initial question: I guess those would still be cognates. I was thinking of groups of cognates that are more of a stretch: they look alike only if you know the patterns of phonetic evolution across a family set, and they don't necessarily sound similar.

dwhieb commented 4 years ago

@vadekh Ok so that sounds like a potential cognate set, which is definitely a useful thing to have. You want to look at lookalikes first to decide which ones you actually think are cognate.

I don't think we should add a value for potentialCognates or anything like that though. That seems too specific. I think that unless there's a really common and well-established use case (like cognate and borrowing), we shouldn't add any more values to this list.

That said, I think a catch-all category just called other would be useful.

So the possible values for this field, for now, until another clear possible use case comes up, are:

xrotwang commented 4 years ago

You may want to look at how CLDF handles cognates in Wordlists.

HughP commented 4 years ago

Or you may want to look at how FLEx creates word relationships and the taxonomies they use. https://software.sil.org/fieldworks/