Open chrzyki opened 4 years ago
This source contains 4 lists with roughly similar basic concepts but their numbers differ. I thought that one of the ways to solve this issue would be just lumping them all together into one big list and adding columns to indicate what page and what list each item was taken from (i.e., English I, English II, English III and English IV). Would it be acceptable or is there any better way of handling the data here?
Yes, this sounds reasonable, another possibility that would allow us to access the concept list in lexibank is to add the lexibank_gloss information (one extra column called "LEXIBANK_GLOSS", which contains all glosses in the source, separated by //
). To store the numbers, you could then just add "NUMBERS_IN_SOURCE", and also apply a separator, but here,
(space) would be sufficient. Otherwise, we will end up having artificial gloss names.
So the structure would be:
ENGLISH | LEXIBANK_GLOSS | NUMBERS_IN_SOURCE |
---|---|---|
English | English // English // English // English | 1 2 3 4 |
hand | hand // hand (not arm) // hand (body part) // hand | 5 6 8 10 |
Did I understand it right, that each row in the LEXIBANK_GLOSS column should contain exactly 4 glosses even when they are completely identical (judging by your example with English in the table)? By English I, English II, English III and English IV in the comment above I meant the way the tables are distinguished in the source (see the screenshot)
Another issue is that the concepts are not numbered in the source so there is no way of tracking the order.
If there's no number in the source, you can ignore identical glosses in Lexibank_Gloss, and add only unique glosses, and discard the numbers_in_source, this is also consistent, and much easer, didn't know that...
https://www.jstor.org/stable/1797647?seq=1#metadata_info_tab_contents