gdemin / expss

expss: Tables and Labels in R
https://cran.r-project.org/web/packages/expss/
83 stars 16 forks source link

Random error 'set_val_lab' - duplicated values in labels: #107

Open bkerwick opened 1 year ago

bkerwick commented 1 year ago

We have started using expss to create a lot of tabs and workbooks and found we are getting a random error in the workflow. To replicate a random error is hard but below is my attempt with screenshots where in the loop the error occurred.

Would like to know if there is a way to avoid this and if i am missing something obvious. Thank you for your time


transport <- sample(c("Car", "Bike"), 10000, replace = TRUE)
age <- sample(c("Under 30", "Over 30"), 10000, replace = TRUE)
gender <- sample(c("M", "F"), 10000, replace = TRUE)
education <- sample(c("High School", "Bachelor's Degree", "Master's Degree", "PhD"), 10000, replace = TRUE)
occupation <- sample(c("Teacher", "Engineer", "Software Developer", "Lawyer", "Nurse", "Professor", "Salesperson", "Doctor", "Marketing Manager", "CEO"), 10000, replace = TRUE)
income <- sample(c("Under 60k", "Over 60k"), 10000, replace = TRUE)
products.Held.Banking..Transaction.Cheque.Current.account <- sample(c('Transaction / Cheque / Current account', ''),10000, replace=TRUE)
products.Held.Banking..Savings.Passbook.Call.account <- sample(c('Savings / Passbook / Call account',''), 10000, replace=TRUE)
products.Held.Banking..Bonus.Bonds <- sample(c( 'Bonus Bonds', ''), 10000, replace=TRUE)
products.Held.Banking..Term.Deposit.Term.Investment <- sample(c( 'Term Deposit / Term Investment', ''), 10000, replace=TRUE)
products.Held.Banking..Unit.Trust.or.Managed.Fund <- sample(c( 'Unit Trust or Managed Fund', ''), 10000, replace=TRUE)
products.Held.Banking..Personal.Retirement.Savings.Superannuation <- sample(c('Personal Retirement Savings / Superannuation', ''), 10000, replace=TRUE)
products.Held.Banking..KiwiSaver <- sample(c( 'KiwiSaver',  ''), 10000, replace=TRUE)
products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in <- sample(c( 'Mortgage or Loan on the home you live in',  ''), 10000, replace=TRUE)
products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own <- sample(c( 'Mortgage or Loan on other properties you own',  ''), 10000, replace=TRUE)
products.Held.Banking..Personal.Loan <- sample(c( 'Personal Loan', ''), 10000, replace=TRUE)
products.Held.Banking..Credit.Card <- sample(c( 'Credit Card', ''), 10000, replace=TRUE)
products.Held.Banking..Debit.Card <- sample(c( 'Debit Card',""), 10000, replace=TRUE)
employment.new <- sample(c('Self-employed - own your own business', 'Self-employed - own your own farm', 'Work in full-time paid employment (i.e. 30 hours or more per week)', 'Work in part-time paid employment (i.e. less than 30 hours per week)', 'Full-time Home Executive', 'Student - Full time', 'Student - Part time', 'Not working at the moment', 'Retired and not working at all', 'Retired, but working occasionally', 'Other', "I'd prefer not to say"), 10000, replace=TRUE)
home.Ownership <- sample(c('The owner of your house', 'Renting or leasing your house', 'A boarder at your house', 'Living with your parents or other relatives', 'Other'), 10000, replace=TRUE)
household.Situation <- sample(c('Single person living alone', 'Single parent living with child / children', 'Single person - have children but they have all left home', "Couple - don't have any children", 'Couple - have child / children living at home', 'Couple - have children, but they have all left home', 'Share household (i.e. adults sharing a house / flatting together)', 'Live with parents', 'Extended family household (i.e. more than two generations living together)', 'Other household arrangement', 'Prefer not to say'), 10000, replace=TRUE)

month.Wave <- sample(c('2020-Jan','2020-Feb','2020-Mar','2020-Apr','2020-May', '2020-Jun', '2020-Jul', '2020-Aug', '2020-Sep', '2020-Oct', '2020-Nov', '2020-Dec','2021-Jan','2021-Feb','2021-Mar','2021-Apr','2021-May', '2021-Jun', '2021-Jul', '2021-Aug', '2021-Sep', '2021-Oct', '2021-Nov', '2021-Dec','2022-Jan','2022-Feb','2022-Mar','2022-Apr','2022-May','2022-Jun','2022-Jul','2022-Aug','2022-Sep','2022-Oct','2022-Nov','2022-Dec','2023-Jan','2023-Feb','2023-Mar','2023-Apr'), 10000, replace=TRUE)

weight2 <- runif(10000, min = 0.2, max = 2.0)
duration <- runif(10000, min = 5, max = 15)
nps <- runif(10000, min = -100, max = 100)
expense <- runif(10000, min = 50, max = 100000)
recommend  <- runif(10000, min = 1, max = 10)

df <- data.frame(
  Transport = transport,
  Age = age,
  Gender = gender,
  Education = education,
  Occupation = occupation,
  Income = income,
  Weightinput = weight2,
  Products.Held.Banking..Transaction.Cheque.Current.account = products.Held.Banking..Transaction.Cheque.Current.account,
  Products.Held.Banking..Savings.Passbook.Call.account = products.Held.Banking..Savings.Passbook.Call.account,
  Products.Held.Banking..Bonus.Bonds = products.Held.Banking..Bonus.Bonds,
  Products.Held.Banking..Term.Deposit.Term.Investment = products.Held.Banking..Term.Deposit.Term.Investment,
  Products.Held.Banking..Unit.Trust.or.Managed.Fund = products.Held.Banking..Unit.Trust.or.Managed.Fund,
  Products.Held.Banking..Personal.Retirement.Savings.Superannuation = products.Held.Banking..Personal.Retirement.Savings.Superannuation,
  Products.Held.Banking..KiwiSaver = products.Held.Banking..KiwiSaver,
  Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in = products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in,
  Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own = products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own,
  Products.Held.Banking..Personal.Loan = products.Held.Banking..Personal.Loan,
  Products.Held.Banking..Credit.Card = products.Held.Banking..Credit.Card,
  Products.Held.Banking..Debit.Card = products.Held.Banking..Debit.Card,
  Employment.new = employment.new,
  Home.Ownership = home.Ownership,
  Household.Situation = household.Situation,
  Month.Wave = month.Wave,
  Duration = duration,
  NPS = nps,
  Expense = expense,
  Recommend = recommend
)

df$Transport <- factor(df$Transport, levels=c('Car', 'Bike'), ordered=TRUE)
df$Age <- factor(df$Age, levels=c('Under 30', 'Over 30'), ordered=TRUE)
df$Gender <- factor(df$Gender, levels=c('M', 'F'), ordered=TRUE)
df$Education <- factor(df$Education, levels=c('High School', "Bachelor's Degree", "Master's Degree", "PhD"), ordered=TRUE)
df$Occupation <- factor(df$Occupation, levels=c('Teacher', 'Engineer', 'Software Developer', 'Lawyer', 'Nurse', 'Professor', 'Salesperson', 'Doctor', 'Marketing Manager', 'CEO'), ordered=TRUE)
df$Income <- factor(df$Income, levels=c('Under 60k', 'Over 60k'), ordered=TRUE)
df$Products.Held.Banking..Transaction.Cheque.Current.account <- factor(df$Products.Held.Banking..Transaction.Cheque.Current.account, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Savings.Passbook.Call.account <- factor(df$Products.Held.Banking..Savings.Passbook.Call.account, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Bonus.Bonds <- factor(df$Products.Held.Banking..Bonus.Bonds, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Term.Deposit.Term.Investment <- factor(df$Products.Held.Banking..Term.Deposit.Term.Investment, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Unit.Trust.or.Managed.Fund <- factor(df$Products.Held.Banking..Unit.Trust.or.Managed.Fund, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Personal.Retirement.Savings.Superannuation <- factor(df$Products.Held.Banking..Personal.Retirement.Savings.Superannuation, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..KiwiSaver <- factor(df$Products.Held.Banking..KiwiSaver, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in <- factor(df$Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own <- factor(df$Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Personal.Loan <- factor(df$Products.Held.Banking..Personal.Loan, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Credit.Card <- factor(df$Products.Held.Banking..Credit.Card, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Debit.Card <- factor(df$Products.Held.Banking..Debit.Card, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Employment.new <- factor(df$Employment.new, levels=c('Self-employed - own your own business', 'Self-employed - own your own farm', 'Work in full-time paid employment (i.e. 30 hours or more per week)', 'Work in part-time paid employment (i.e. less than 30 hours per week)', 'Full-time Home Executive', 'Student - Full time', 'Student - Part time', 'Not working at the moment', 'Retired and not working at all', 'Retired, but working occasionally', 'Other', "I'd prefer not to say"), ordered=TRUE)
df$Home.Ownership <- factor(df$Home.Ownership, levels=c('The owner of your house', 'Renting or leasing your house', 'A boarder at your house', 'Living with your parents or other relatives', 'Other'), ordered=TRUE)
df$Household.Situation <- factor(df$Household.Situation, levels=c('Single person living alone', 'Single parent living with child / children', 'Single person - have children but they have all left home', "Couple - don't have any children", 'Couple - have child / children living at home', 'Couple - have children, but they have all left home', 'Share household (i.e. adults sharing a house / flatting together)', 'Live with parents', 'Extended family household (i.e. more than two generations living together)', 'Other household arrangement', 'Prefer not to say'), ordered=TRUE)

df$Month.Wave <- factor(df$Month.Wave, levels=c('2020-Jan','2020-Feb','2020-Mar','2020-Apr','2020-May', '2020-Jun', '2020-Jul', '2020-Aug', '2020-Sep', '2020-Oct', '2020-Nov', '2020-Dec','2021-Jan','2021-Feb','2021-Mar','2021-Apr','2021-May', '2021-Jun', '2021-Jul', '2021-Aug', '2021-Sep', '2021-Oct', '2021-Nov', '2021-Dec','2022-Jan','2022-Feb','2022-Mar','2022-Apr','2022-May','2022-Jun','2022-Jul','2022-Aug','2022-Sep','2022-Oct','2022-Nov','2022-Dec','2023-Jan','2023-Feb','2023-Mar','2023-Apr'), ordered=TRUE)

df$total <- 'Base'

df$Weight <-   df$Weightinput
df$Weight <- as.numeric(df$Weight)
##default Labels:
for (var in names(df)) {
  if (is.null(var_lab(df[[var]]))) {
    var_lab(df[[var]]) = var
  }
}

tableCaption  <- "show set_val_lab error"

var_lab(df$total) = ""

for (i in 1:3000) {
  print(i)

first_table = df %>%
  tab_significance_options(compare_type = "adjusted_first_column",min_base = 30,subtable_marks = "both",sig_labels_first_column = c("Batman+", "Joker-"),mode = c("replace")) %>%
  tab_cols(
    eval( expression(list(
      total(),
      df$Month.Wave

    )))
  ) %>%
  tab_cells(list(df$total)) %>%
  tab_stat_cases(total_row_position = "none", label = "row %",total_statistic = "u_responses") %>%
  tab_stat_cases(total_row_position = "none", label = "Unweighted") %>%
  tab_weight(df$Weight) %>%
  tab_stat_cases(total_row_position = "none", label = "Weighted") %>%
  tab_stat_cases(total_row_position = "none", label = "row %",total_statistic = "u_responses") %>%
  tab_cells(
    #add custom variables and rtable.txt
    eval(expression(list(

      df$Transport,
      df$Age,
      df$Gender,
      df$Education,
      df$Occupation,
      df$Income,
      mrset(Products.Held.Banking..Transaction.Cheque.Current.account %to% Products.Held.Banking..Debit.Card, label = 'Products Held Banking'), #3 months rolling
      df$Employment.new,
      df$Home.Ownership,
      df$Household.Situation

    )))
  ) %>%
  #tab_stat_cases(total_row_position = "above") %>%
  tab_stat_cpct(total_row_position = "above",total_label = c("row %","Unweighted", "Weighted"),total_statistic = c("u_responses","u_cases", "w_cases")) %>%
  tab_last_sig_cpct(mode = "replace") %>%
  tab_row_label("#Mean Statistics") %>%
  tab_cells(
    #means go here
    eval(expression(list(
      df$Weight,
      df$Duration,
      df$NPS,
      df$Expense,
      df$Recommend

    )))
  ) %>%
  tab_stat_mean_sd_n(weighted_valid_n = TRUE) %>%
  tab_last_sig_means(mode = "replace") %>%
  tab_pivot(stat_position = "inside_rows")%>%
  set_caption(tableCaption)

}

Screenshot 2023-03-22 at 1 12 45 PM Screenshot 2023-03-22 at 2 03 11 PM Screenshot 2023-03-22 at 2 19 13 PM Screenshot 2023-03-22 at 2 42 47 PM Screenshot 2023-03-22 at 2 58 46 PM

gdemin commented 1 year ago

Thank you for the detailed description. Could you provide the full result of the sessionInfo()? I need to see the list of attached packages. There is no such information in your screenshot.

bkerwick commented 1 year ago

Sorry about that here you go

Screenshot 2023-03-23 at 11 06 36 AM
gdemin commented 1 year ago

I have run your code several times and didn't see any errors. Also tried with loaded dplyr with the same result. As far as I can see in the "attached packages" you load other packages except the expss. Could you give me all the code with library's  which you execute before running the code above?

bkerwick commented 1 year ago

I use macos but today i tried a fresh install of R on windows, installed expss and it's dependencies. Ran the code without doing anything else and below occurred. I hope this helps

envir

bkerwick commented 1 year ago

Was this what you were looking for? Were you not able to recreate the error?

gdemin commented 1 year ago

@bkerwick I have reproduced this issue. It seems this bug is Windows specific. Currently I don't know why this happens. Will investigate further.

quicly commented 1 year ago

I also encountered this issue today. I couldn't figure out what might be causing it, but it seems to be somehow related to unexpected behavior of the "/" character in value labels. Removing them has prevented this error, but I haven't tested it further yet, maybe it is just a coincidence, as this error was pretty random for me.

JB0207 commented 1 year ago

I also recently ran into the error when trying to create a crosstab with 208 variables in tab_cells and 16 in tab_cols. Removing "/" character did not solve the problem for me. I have observed that the error becomes more likely the more variables you include.

I wrote a code with trycatch as a temporary solution, so that I don't always have to re-run the code myself until it works. For the above scenario it took 7 tries until I got the record/table. Maybe it helps someone:

c = 0 # set counter to zero

repeat{

    error <- FALSE
    print(c)

    tryCatch(my_table <- datasetSAV %>%
               tab_cells(X1,
                             X2
               ) %>%
               tab_cols(total(),
                             X3,
                             X4
               ) %>%
               tab_stat_cases(label = "N", total_row_position = "above") %>%
               tab_stat_cpct(label="%", total_statistic = "w_cpct", total_label = "#Total cases") %>%
               tab_pivot(stat_position = "inside_columns") %>%
               drop_empty_rows() %>%
               drop_empty_columns(), 
               error = function(e){ error <<- TRUE})

    if(error == FALSE){ break } 

    if(c == 10){ break}

    c = c + 1
    print("Error")

 }

@gdemin Many thanks for your efforts and the great package!

wck01 commented 1 year ago

Thank you so much, @JB0207, for your valuable comment. Your suggestion to use the "TryCatch function with Repeat" worked perfectly for me. I appreciate your time and effort in helping me with this issue.

Waschoi commented 9 months ago

I also encountered this issue today. I couldn't figure out what might be causing it, but it seems to be somehow related to unexpected behavior of the "/" character in value labels. Removing them has prevented this error, but I haven't tested it further yet, maybe it is just a coincidence, as this error was pretty random for me.

I tried this approch and had no success. There problem seems to be somewhere else

Waschoi commented 9 months ago

I forked your package and changed these 2 lines: https://github.com/gdemin/expss/compare/master...Waschoi:expss:master Maybe this could be done as an option, but I am not clever enough to figure this out.

This works well for me.

gdemin commented 8 months ago

@Waschoi Thank you for your investigation but I can't use this workaround in the CRAN version.

You removed the check for label code duplication. And duplicated codes in value labels can produce unpredictable bugs in further processing, such as table creation.

Waschoi commented 8 months ago

This was not intended to be a permanent solution, but rather a good workaround for me. As soon as there is a real fix, I would use your version again. The bug really drove us crazy because it is not reproducible.

Waschoi commented 1 month ago

Since R 4.4.1 the problem is gone 😁

bkerwick commented 1 month ago

Thats great news thank you for your work around also