cldf-datasets / mattercariban

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Specify exact swadesh list used for the lexical data #4

Open xrotwang opened 4 years ago

xrotwang commented 4 years ago

It seems raw/cariban_swadesh_list.csv uses Swadesh-1960-100a, but with the differences listed below. Most of these might be easy to explain: There's a couple of re-orderings, four terms are missing in the Cariban data. But what to do with "son" vs "person"?

3c3
< 2,you
---
> 2,thou
7,8c7,8
< 6,what
< 7,who
---
> 6,who
> 7,what
19c19
< 18,son
---
> 18,person
27a28
> 27,bark
32,33c33,34
< 32,egg
< 33,grease/fat
---
> 32,grease
> 33,egg
51c52
< 51,breast
---
> 51,breasts
66a68,70
> 67,lie
> 68,sit
> 69,stand
93,94c97,98
< 96,good
< 97,new
---
> 96,new
> 97,good
fmatter commented 4 years ago

I largely relied on Meira & Franchetto 2005, which use a Swadesh list tailored to Cariban languages.

I will shortly cleanse the Swadesh list of some more meanings that are not particularly useful for comparative purposes in Cariban languages, like 'to swim', 'to fly', or various color terms.

Does it need to adhere to a particular one? Is it not possible to map individual meanings to Concepticon entries?

xrotwang commented 4 years ago

Yes, we could map individual terms, but if possible, we should reuse an existing list. The only thing not straightforward is "son" vs. "person". If this is not an error, we'll have to resort to a custom concept list.

Florian Matter notifications@github.com schrieb am Di., 22. Sep. 2020, 17:47:

I largely relied on Meira & Franchetto 2005 https://www.jstor.org/stable/10.1086/491633?seq=1#metadata_info_tab_contents, which use a Swadesh list tailored to Cariban languages.

I will shortly cleanse the Swadesh list of some more meanings that are not particularly useful for comparative purposes in Cariban languages, like 'to swim', 'to fly', or various color terms.

Does it need to adhere to a particular one? Is it not possible to map individual meanings to Concepticon entries?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/mattercariban/issues/4#issuecomment-696807385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKDG667VO5B2LM3F5GLSHDBIZANCNFSM4ROPL2BA .

fmatter commented 4 years ago

It definitely is 'son' in the article I linked above, but there is no explanation as to what specific Swadesh list was used or modified. A potential reason might be that the word for 'person' is often equal to the autodenomination, which means that many forms will not be cognate at all. This is not the case for 'son', though.

LinguList commented 4 years ago

Okay, this then means that we make a new Swadesh list, if the Swadesh list used here is not standard. I think it is generally better, as it is also scientifically more accurate to say: this study is 99% Swadesh-100 in Concepticon, but has this one thing different. And making an official concept list on Concepticon is not difficult.

fmatter commented 4 years ago

OK! Like I said, there’s some sorting out I still need to do, so the items might change a little bit.

On 22 September 2020 at 19:22:06, Johann-Mattis List ( notifications@github.com) wrote:

Okay, this then means that we make a new Swadesh list, if the Swadesh list used here is not standard. I think it is generally better, as it is also scientifically more accurate to say: this study is 99% Swadesh-100 in Concepticon, but has this one thing different. And making an official concept list on Concepticon is not difficult.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/mattercariban/issues/4#issuecomment-696862693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASEVFJKITWVS5TNHDZYQ2TSHDML5ANCNFSM4ROPL2BA .

xrotwang commented 4 years ago

Just noticed that there are two new concepts in the lexical data: "snake" and "three" - corroborating that the best way forward would be a separate concept list in Concepticon for this resource - largely overlapping with Swadesh 100.

fmatter commented 4 years ago

Yes, I've started adding replacements for some words such as posture verbs or color terms, which are inexistent or morphologically complex and thus very varied across the family, which is of little use to my phylogenetic classification. I can leave the terms I won't be using in this list -- but a custom Swadesh list would be the way to go, in any case.

LinguList commented 4 years ago

@fmatter, for submitting this to concepticon, can I ask you to have a look at https://calc.hypotheses.org/2225 where this is nicely described? We'd need some published reference though, either you put the list on Zenodo, if the data are not yet published, or you could also -- if you want to -- write a very small blog post four our blog at calc.hypotheses.org, where you say why the conceppt list is different from Swadesh normal. In any case, we'd then have a nice additional concept list for concepticon.

fmatter commented 3 years ago

@LinguList Can I cite my PhD thesis for this? (not published yet, but turned in…)

xrotwang commented 3 years ago

For the Swedish list?

Florian Matter notifications@github.com schrieb am Fr., 8. Jan. 2021, 11:30:

@LinguList https://github.com/LinguList Can I cite the pre-defense version of my PhD thesis for this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/mattercariban/issues/4#issuecomment-756680439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKDY3676X552UYBTWZDSY3NFHANCNFSM4ROPL2BA .

fmatter commented 3 years ago

For a new concepticon entry -- my list isn't Swadesh-1960-100a.

xrotwang commented 3 years ago

Ah, yes. Perfect.

Florian Matter notifications@github.com schrieb am Fr., 8. Jan. 2021, 11:34:

For a new concepticon entry -- my list isn't Swadesh-1960-100a.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/mattercariban/issues/4#issuecomment-756682540, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKD75AQVUYPO23KAFMDSY3NU5ANCNFSM4ROPL2BA .

LinguList commented 3 years ago

Yes, we can then later update the list in Concepticon and you should get back to us to do so, once your thesis is published in some way.

fmatter commented 3 years ago

I'm all set up for a pull request to the concepticon repo, as per the tutorial you linked, but I think I'd need to be added as a contributor on the repo in order to push?

xrotwang commented 3 years ago

@fmatter submitting PRs from your fork of the repos shouldn't require any privileges on this repos.

LinguList commented 3 years ago

I'd just like to mention that we can ALSO do this slightly differently: we lave the "concepts.tsv" as long as the thesis is officially out and then add it to concepticon. We have been doing this with some other datastets with new concept lists as well, as it would allow us to reference the dataset without making too many changes in concepticon once the data has appeared. For CLDF it doesn't matter, the only difference may be the review that you would receive by our team.

One more point: you can also consider presenting the concept list in a small blog post, and we can quote the blog post, example here. This would require a ~ one-page summary of why you chose your list.

LinguList commented 3 years ago

Concept list for this example is here.

fmatter commented 3 years ago

I'd just like to mention that we can ALSO do this slightly differently: we lave the "concepts.tsv" as long as the thesis is officially out and then add it to concepticon. We have been doing this with some other datastets with new concept lists as well, as it would allow us to reference the dataset without making too many changes in concepticon once the data has appeared. For CLDF it doesn't matter, the only difference may be the review that you would receive by our team.

One more point: you can also consider presenting the concept list in a small blog post, and we can quote the blog post, example here. This would require a ~ one-page summary of why you chose your list.

Right, so the concept list would be added on Concepticon, but its "source" would be the blog post instead of the thesis? That I can do.

fmatter commented 3 years ago

@fmatter submitting PRs from your fork of the repos shouldn't require any privileges on this repos.

I hadn't forked it, thanks 🤦‍♂️

fmatter commented 3 years ago

One more point: you can also consider presenting the concept list in a small blog post, and we can quote the blog post, example here. This would require a ~ one-page summary of why you chose your list.

In what format? Does Markdown work?

LinguList commented 3 years ago

For our blog post at calc.hypotheses.org, youd' need an account, which I could make for you, and write in wordpress (where you can also write source code in html or paste from a doc-file). I also use markdown convert to HTML and paste in there. I'd also review the contribution to make sure it conforms to the styles, etc.

fmatter commented 3 years ago

For our blog post at calc.hypotheses.org, youd' need an account, which I could make for you, and write in wordpress (where you can also write source code in html or paste from a doc-file). I also use markdown convert to HTML and paste in there. I'd also review the contribution to make sure it conforms to the styles, etc.

OK, writing it in markdown.