D-PLACE / dplace-data

The data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)
https://d-place.org
Creative Commons Attribution 4.0 International
78 stars 37 forks source link

Undefined codes for some SCCS datapoints #231

Closed xrotwang closed 5 years ago

xrotwang commented 5 years ago

dplace check reports the following problems:

ERROR   undefined code for variable SCCS1926 and society SCCS140:54
ERROR   undefined code for variable SCCS811 and society SCCS103:0
ERROR   undefined code for variable SCCS811 and society SCCS110:0
ERROR   undefined code for variable SCCS811 and society SCCS118:0

See also #187

xrotwang commented 5 years ago

@kirbykat how should we fix these?

SCCS811:

SCCS811,"Data Quality, Childhood, Life cycle",Number of verification efforts,"Rohner, R. P., Berg, D. S., & Rohner, E. C. (1982). Data quality control in the standard cross-cultural sample: cross-cultural codes. Ethnology, 21(4), 359-369.",Ordinal,number (integer),rohner1982data,,Note: missing data entries could not be clearly distinguished from cases where the verification efforts identified by the codes were not used.

has defined codes

It would seem that we should add a code 0 for 0 verification efforts, but then the comment above suggests that the distinction of 0 efforts and NA could not really be made.

xrotwang commented 5 years ago

SCCS1926,"Life cycle, Mourning, Death, Ritual, Religion",Male Actual Self-injury: Frequency,"Rosenblatt, P. C., Walsh, R. P., & Jackson, D. A. (2011). Grief and mourning codes. World Cultures eJournal, 18(2).",Ordinal,,rosenblatt2011codes,,

has defined codes

So 54 could well be a typo. But is it 5 or a missing code 4?

kirbykat commented 5 years ago

For v811 - It looks like there should also be a code "0" - as in "0" verification efforts were made, which should be distinct from "NA".

I just skimmed the original paper, and didn't immediately find the line that is currently part of the variable description in the paper (the one about the difficulty of distinguishing NA from 0 verification efforts). In fact, it seems clear that some societies are coded "0" and some "-" (i.e., missing data). I also found that one of the societies currently coded "NA" should in fact be coded "0" (SCCS55 - Abkhaz). This should bring the total number societies coded "0" for variable 811 to 4 (SCCS55, SCCS103, SCCS110, SCCS118).

Further impetus to do a full review of the codes we obtained (already digitized by others) vis a vis the codes in original papers...I will add this to my to-do-in-not-too-distant-future list.

Here is the original paper: rohner1982data.pdf

kirbykat commented 5 years ago

For v1926, it looks like code "54" should actually be a "4", based on the frequency breakdown given in the Rosenblatt codebook (rosenblatt2011codes.pdf). Number of cases for all other 'frequencies' match up.

I think the code definition can just follow the others ("4 (on a scale of 1-10)")

v1926. Male Actual Self-Injury: Frequency (code book variable 9) N -- Code -- Meaning 157 -- NA -- Missing data 14 -- 0 -- Absent 1 -- 1 4 -- 2 1 -- 4 2 -- 5 2 -- 6 1 -- 7 2 -- 8 1 -- 9 1 -- 10 Occurs always

xrotwang commented 5 years ago

@kirbykat ok, thanx, I'll take care of making these changes.