concepticon / concepticon-data

The curation repository for the data behind Concepticon.
https://concepticon.clld.org
33 stars 37 forks source link

Observations on the Indigenous Tribes of the N. W. Coast of America (1841) #996

Open chrzyki opened 4 years ago

chrzyki commented 4 years ago

https://www.jstor.org/stable/1797647?seq=1#metadata_info_tab_contents

Kristina-Pianykh commented 3 years ago

This source contains 4 lists with roughly similar basic concepts but their numbers differ. I thought that one of the ways to solve this issue would be just lumping them all together into one big list and adding columns to indicate what page and what list each item was taken from (i.e., English I, English II, English III and English IV). Would it be acceptable or is there any better way of handling the data here?

LinguList commented 3 years ago

Yes, this sounds reasonable, another possibility that would allow us to access the concept list in lexibank is to add the lexibank_gloss information (one extra column called "LEXIBANK_GLOSS", which contains all glosses in the source, separated by //). To store the numbers, you could then just add "NUMBERS_IN_SOURCE", and also apply a separator, but here, (space) would be sufficient. Otherwise, we will end up having artificial gloss names.

So the structure would be:

ENGLISH LEXIBANK_GLOSS NUMBERS_IN_SOURCE
English English // English // English // English 1 2 3 4
hand hand // hand (not arm) // hand (body part) // hand 5 6 8 10
Kristina-Pianykh commented 3 years ago

Did I understand it right, that each row in the LEXIBANK_GLOSS column should contain exactly 4 glosses even when they are completely identical (judging by your example with English in the table)? By English I, English II, English III and English IV in the comment above I meant the way the tables are distinguished in the source (see the screenshot)

Another issue is that the concepts are not numbered in the source so there is no way of tracking the order. 2021-05-16_18-57-39

LinguList commented 3 years ago

If there's no number in the source, you can ignore identical glosses in Lexibank_Gloss, and add only unique glosses, and discard the numbers_in_source, this is also consistent, and much easer, didn't know that...