fsolt / dcpo_gayrights

Dynamic Comparative Public Opinion
MIT License
4 stars 1 forks source link

Double checking existing survey items #4

Closed fsolt closed 1 year ago

fsolt commented 2 years ago

Okay, per our discussion this morning, our next step is to divide up data-raw/surveys_gm.csv and double-check each entry.

This means you will have to:

  1. confirm that the variable is correct (named as it appears in the dataset, not necessarily the codebook--sometimes they are different!--and in all lowercase),
  2. confirm that the values are correctly entered (from least tolerant response to most tolerant response),
  3. enter the question_text as it appears in the codebook,
  4. enter the response_categories as they appear in the codebook (highest and lowest is okay for Likert scales; see gender_roles for examples),
  5. enter agreed equals 1 if you saw no problems with variable and values, 0 if there was a problem
  6. repeat 1-5 for any entries for the same survey in surveys_immigration.csv before moving on to the next survey (next week, I'll assign any leftover surveys--surveys that only include immigration items--after we do all the surveys that appear in both files)

@sammo3182: rows 1-89 @Tyhcass: rows 90-179 @byngdeuk: rows 180-269 @hey-ikon: rows 270-359

If you're comfortable with RStudio and GitHub, please go ahead and commit your updates to the file--just remember to pull before you push! (But don't worry, I'll fix it if you forget.) If you haven't gotten comfortable yet, or you have any problem, save your changes to a new file (e.g., surveys_gm_hey-ikon.csv) and push that, then I'll combine them as I've done with your DCPOtools contributions.

Things to do in next meeting after this is finished:

  1. discuss all 0s in agree
  2. confirm item codings
  3. move on to entering items from new surveys (some already identified in #2, but will need to search all newly added DCPOtools surveys)
hey-ikon commented 2 years ago

I have some questions on working on double checking survey items:

  1. ress2014 : what is the full survey/data name? I tried to use data link, but it is broken.
  2. som_combo: what is the full survey/data name? Couldn't figure it out.
  3. How to access the pew archive data? I could use 'roper' archive, but I have difficulty in checking data under pew archive.
  4. For some ascii data, it is hard to check the variable names on data (not on the codebook). -- e.g., uspew1996_media
hey-ikon commented 2 years ago

Some notes as revising:

fsolt commented 2 years ago


Some notes as revising:

  • [x] 1. wvs4_swe -> I don't find evidence that questions are only asked for Sweden (https://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp). "WVS4 Results by country" document shows that v208 and v76 were asked to other countries as well. We may want to change the dataset name to wvs4..?

wvs4_swe exists because this particular country-year survey was left out of the wvs_combo file, apparently by mistake since it is still included in the updated wave 4 file.

  • [x] 2. wvs_combo neigh2's variable is confusing... I am not sure how to check this variable.

The stem is "On this list are various groups of people. Could you please mention any that you would not like to have as neighbors?  Homosexuals [also asked: Immigrants/foreign workers]" so a "mentioned" response means that R wouldn't want members of the named group as neighbors (i.e., is expressing homophobia or xenophobia) and "not mentioned" means R is at least okay with that group in this context. Does that resolve the confusion?

The data are here: https://www.pewresearch.org/politics/dataset/september-1999-news-interest-index/ but tbh I think we should use Roper when possible because a. they seem to value backwards compatibility more and don't mess with their website as much as Pew, and b. they have an SPSS version while Pew only has ASCII. The link for Roper is https://ropercenter.cornell.edu/ipoll/study/31095707

It's the July News Interest Index, so it really should be called uspew2005_07nii for consistency. Here it is at Roper: https://ropercenter.cornell.edu/ipoll/study/31095851

  • [x] 3) For "uspew2006_nii", the pew has separate monthly data. Some data includes gay rights related questions, while others do not (e.g., March and June have a gay marriage question). I edited and added survey name on 'surveys_gm' file. I will make relevant changes on 'surveys_data' file.

Perfect. I guess you see the existing 'uspew2006_nii' was the March one; Roper has it here: https://ropercenter.cornell.edu/ipoll/study/31095871 . The June one is here: https://ropercenter.cornell.edu/ipoll/study/31095873

Roper: https://ropercenter.cornell.edu/ipoll/study/31095930 -- please rename as uspew2008_08rel

Roper: https://ropercenter.cornell.edu/ipoll/study/31096004 -- please rename as uspew2010_07rel

It's the original survey, not the callback. Roper: https://ropercenter.cornell.edu/ipoll/study/31096056

Tyhcass commented 2 years ago
sammo3182 commented 1 year ago

Fred @fsolt , in the commit 5f656a8f1e202bd9dd39dfa8b822112878becd88, here's the list of surveys I cannot access the data. I'll appreciate it so much if you can share the data and codebook files of the missing ones! Thanks!

fsolt commented 1 year ago


Fred @fsolt , in the commit 5f656a8, here's the list of surveys I cannot access the data. I'll appreciate it so much if you can share the data and codebook files of the missing ones! Thanks!

I've moved all these into the dropbox. I've left duplicate listings unchecked in case they actually mean something that I haven't figured out. If you can't get into the dropbox or there are other issues, please let me know.

fsolt commented 1 year ago


  • [x] dkes2005, dkes2007, cannot find raw data from original website
  • [x] gallup_uk1981, gallup_uk1985, gallup_uk1986, original data is removed from the website

I've added these to the dropbox dcpo_datasets/dataset_requests

fsolt commented 1 year ago


  • [x] bsa2017, option includes (Depends/varies) 6
  • [x] cnn199301, cnn199406, cnn199810,cnn200309, option includes "not sure"
  • [x] eb691, eb712, eb774, eb834, option includes "indifferent"

We never include answers that respondents volunteered but weren't offered.

fsolt commented 1 year ago


  • [ ] gallup_us198606, gallup_us198703, question number should be q10a, and q05c. For gallup_us198206,gallup_us198606, gallup_us198703, need to double-check Weight since mean Weight is not 1.

Moved to https://github.com/fsolt/DCPOtools/issues/59

That issue is now closed, but the weight variables should still be double-checked

fsolt commented 1 year ago

Next step: check questions by item

  1. Look at each item alone to confirm question_text is similar (think of could it be the same question translated differently) and should not be split
  2. Look across similar items to ensure that questions are grouped together appropriately (see accept2 and accept2a for example: accept2 question is "Some people feel that homosexuality is a lifestyle that should be accepted by society. Other feel that homosexuality is a lifestyle that should be discouraged by society. Which comes closer to your viewpoint, the first position or the second?" while accept2a is "Do you feel that homosexuality should be considered as an acceptable alternative life-style or not?"
    cbsnyt199302 has "Do you feel that homosexuality should be considered as an acceptable alternative lifestyle or not?" which was originally coded as accept2 but really should be accept2a because it is only missing the hyphen. OTOH, these two items really might be best grouped together...
  3. Look across similar items to confirm that we want to keep them separated. This should be group discussion, so just take note of items you think maybe should be grouped.

*OTOH: on the other hand wrt: with regard to fwiw: for what it's worth

@byngdeuk: marry @hey-ikon: accept and adopt @Tyhcass: approve to free (alphabetically) @sammo3182: hioff to strength5 (alphabetically, not including marry)

byngdeuk commented 1 year ago

Finishing First step :)

I'm done with the "rows 180-269" !

byngdeuk commented 1 year ago

Related to the "marry",

  1. All classifications are reasonable to me! I will double-check after all the questionnaires are updated. For instance, "bsa2007, marry5, c(5:1)" is not updated yet!

  2. Except "abcwapo201402, marry41, c(4:1), "Do you or does anyone in your house own a gun, or not?" I think this questionnaire is not related to gray rights :)!

fsolt commented 1 year ago

Except "abcwapo201402, marry41, c(4:1), "Do you or does anyone in your house own a gun, or not?" I think this questionnaire is not related to gray rights :)!

Ha! I don't know where that text comes from, but abcwapo201402 (that is, USABCWASH2014-1159 in Roper) has "Overall, do you support or oppose allowing gays and lesbians to marry legally" for q15


I've entered the correct text in https://github.com/fsolt/dcpo_gayrights/commit/782430ae1475021c9dc3f55cccc7a1a2e26cea87.

fsolt commented 1 year ago

Re aes1987, aes2007, anes_combo, anpas1979 (@byngdeuk: "The initial response_categories was(3,1)being different from the codebook"), I confirmed on the survey questionnaires (that is, the documents the survey-takers used to give the questions) that the third responses were volunteered. We never use volunteered categories (for better or worse), so these are just two-category questions. https://github.com/fsolt/dcpo_gayrights/commit/acc7708817ba6d796d7968f9ebec9ccb8c5816dc I suspect that this is the same issue that @hey-ikon ran into with that streak of cnn surveys, but I haven't gone through those yet

hey-ikon commented 1 year ago

Check Questions by Item

General question

  1. What was the guiding rule for distinguishing ####, ####a, and ####b questions?

Notes on accept

  1. N/A

Notes on adopt

  1. I don't see differences between adopt4a and adopt4b questions... (also related to the general question) adopt4a: Allowing gay and lesbian couples to adopt children. adopt4a: Same-sex couples should have the same rights to adopt children as heterosexual couples. adopt4b: Do you strongly favor, favor, oppose, or strongly oppose allowing gays and lesbians to adopt children?
  2. Potentially change the current 'adopt4' to 'adopt4a' or something else because there is only one survey item for adopt4.

Other updates

  1. On the way, I updated a few missing items.
  2. uspew2008_rel download is still unavailable.
  3. dkes2005, dkes2007 survey links weren't loaded on my end.
  4. nores_combo2, nores2013 survey links weren't loaded on my end.
sammo3182 commented 1 year ago

Finish checking hioff to strength5 (alphabetically, not including marry), two issues:

Tyhcass commented 1 year ago

I didn't see text problem or grouping problem from item approve to free on my end.

byngdeuk commented 1 year ago


done :)

I think we are now at double check stage. On my perspective, I couldn't find any error.

fsolt commented 1 year ago

for @fsolt: 1. arbitrate zero in agree, 2. triple check surveys that @Tyhcass can't find