ekstroem / dataMaid

An R package for data screening
143 stars 26 forks source link

Add support for user defined NA data (from SPSS haven import) #52

Closed lawrencehr closed 3 years ago

lawrencehr commented 4 years ago

Hi, I'm wondering if it's possible to add support for dataframes with user defined NA tags. When you import spss data using haven::read_spss(..., user_na = T) the user defined NAs are preserved. However, when you run this data through dataMaid, I get the following error:

Error: Can't convert to .

A lot of survey data has multiple types of missing values (not answered, invalid response, skipped), so there is a strong use case for this.

annennenne commented 4 years ago

I would guess that your problem occurs before dataMaid even gets its hands on the data. I am pretty sure that error message (Error: Can't convert to .) is not ours, but if you could provide a minimal example that produces the error, I'd be happy to have a look at it.

lawrencehr commented 4 years ago

Thanks so much for your quick reply!

So, I've tried it all again and the error I seem to be getting is actually Error : Can't convert <character> to <double>.. From what I can tell, the dataMaid report/codebook reaches the haven dbl+label variable and something is stopping it from converting properly. It doesn't seem to be specific to variables with user defined NAs either. I tried this on two different surveys and the error occurs across data sets.

Here's my code:


library(haven)
library(tidyverse)

aes19_unrestricted <- haven::read_spss("XXXX\01468_p1.sav", user_na = TRUE) #import survey data with tagged values

  taggedvar <- aes19_unrestricted %>%
    select("STATE") #select first dbl+lbl variable

  str(taggedvar)
#> tibble [4,000 x 1] (S3: tbl_df/tbl/data.frame)
#>  $ STATE: dbl+lbl [1:4000] 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
#>    ..@ label        : chr "State"
#>    ..@ format.spss  : chr "F2.0"
#>    ..@ display_width: int 10
#>    ..@ labels       : Named num [1:8] 1 2 3 4 5 6 7 8
#>    .. ..- attr(*, "names")= chr [1:8] "NSW" "VIC" "QLD" "SA" ...
  dataMaid::makeDataReport(taggedvar)
#> Error : Can't convert <character> to <double>.

Created on 2020-08-04 by the reprex package (v0.3.0)

annennenne commented 4 years ago

I agree, it does look like it might be makeDataReport() that causes the error anyway.

Could you provide a dataset that I can use to try to reproduce the error? For example "01468_p1.sav" or a synthetic dataset that creates the error?

lawrencehr commented 4 years ago

Of course: here's reprex of me creating the data. The dummydata is uploaded here in an .R object.

dummydata.zip

library(tidyverse)
aes19_unrestricted <- haven::read_spss("XXXXX/01468_p1.sav") #import survey data with tagged values

subsettedf <- aes19_unrestricted %>%
    select("STATE")#select first dbl+lbl variable

dummydata <- subsettedf[1:4,]

  str(dummydata)
#> tibble [4 x 1] (S3: tbl_df/tbl/data.frame)
#>  $ STATE: dbl+lbl [1:4] 2, 2, 2, 2
#>    ..@ label        : chr "State"
#>    ..@ format.spss  : chr "F2.0"
#>    ..@ display_width: int 10
#>    ..@ labels       : Named num [1:8] 1 2 3 4 5 6 7 8
#>    .. ..- attr(*, "names")= chr [1:8] "NSW" "VIC" "QLD" "SA" ...
  dataMaid::makeDataReport(dummydata)
#> Error : Can't convert <character> to <double>.
#> Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
#> Data report generation is finished. Please wait while your output file is being rendered.
#> 
#>  Is dataMaid_dummydata.docx open on your computer? Please close it as fast as possible to avoid problems!

Created on 2020-08-05 by the reprex package (v0.3.0)

annennenne commented 4 years ago

I think something wen't wrong with creating the data. .R-files are for scripts, not data. The following code should do the job for you and place the dataset in a file called dummydata.rda in your working directory:

library(tidyverse)
aes19_unrestricted <- haven::read_spss("XXXXX/01468_p1.sav") #import survey data with tagged values
subsettedf <- aes19_unrestricted %>%
    select("STATE")#select first dbl+lbl variable

dummydata <- subsettedf[1:4,]
save(list = "dummydata", file = "dummydata.rda")
lawrencehr commented 4 years ago

Apologies - I'm pretty new to posting issues on Github! This is my first one :)

This should be the correct file extension.

dummydata.zip

annennenne commented 4 years ago

No problem! Thanks for posting the issue. I think I found the issue and I will have a look at fixing it sometime soon. I will let you know when there's a solution.

annennenne commented 4 years ago

Notes to self: