gdemin / expss

expss: Tables and Labels in R
https://cran.r-project.org/web/packages/expss/
84 stars 16 forks source link

Intermittent error: "duplicated values in labels" #104

Closed tgravelle closed 1 year ago

tgravelle commented 1 year ago

Hello!

In using expss to tabulate survey data, I am receiving the error message below intermittently. By "intermittent" I mean that when this error occurs, I can simply re-run the example code (also below) and often obtain the desired tabulation. For context, all of the columns in data.2 being tabulated are factors (I am not at liberty to share the data).

Error in set_val_lab.default(x, value, add = FALSE) : 
  'set_val_lab' - duplicated values in labels:
banner_tables <- data.2 %>%
  tab_weight(weight = weight) %>%
  tab_total_row_position("above") %>%
  tab_cols(total(), gender, age, education, race.ethn, region) %>%
  tab_cells(Q21.0 %to% Q21.11) %>% tab_stat_cpct() %>%
  tab_cells(Q22.0 %to% Q22.13) %>% tab_stat_cpct() %>%
  tab_cells(Q23.0 %to% Q23.6) %>% tab_stat_cpct() %>%
  tab_cells(Q24.0, Q24.1, Q24.2) %>% tab_stat_cpct() %>%
  tab_cells(Q25.0, Q25.1, Q25.2) %>% tab_stat_cpct() %>%
  tab_pivot() %>%
  as_tibble()

This is rather a strange behavior that I haven't encountered with expss before. Why would unchanged code produce varying results -- sometimes returning an error message and sometimes returning the desired tabulation?

Thank you in advance for any insight on what might be causing this.

gdemin commented 1 year ago

Hello! Where do you get your data from? Which function do you use to load the data? Could you apply the function below to your dataset? It will detect factors with duplicated levels:

dupl_levels = function(df){
    res = lapply(df, attr, 'levels')
    dupl_levels = sapply(res, anyDuplicated)
    res[dupl_levels>0]
}

And could you try to detect the column which caused the error? To do this you need to remove variables from your code for the table one by one and find after which variable the code will stop raising an error.

tgravelle commented 1 year ago

Thank you for your reply. I'm reading in an SPSS .sav dataset exported from IncQuery using haven::read_sav(). I've previously done the same with .sav files from other sources without issue. Your function does not identify any duplicated factor levels in my data (it returns a list of length 0).

I also have no way of reliably determining which column is throwing an error because the error occurs intermittently: the same code will sometimes run and sometimes not not run -- without making any changes to the code.

gdemin commented 1 year ago

Could you provide the result of str on each variable which is used in your code for the table?

gdemin commented 1 year ago

The same as #107. Could you run the sessionInfo() on your system and place the results here?

gdemin commented 1 year ago

The same as #107. Closed as duplicated.