SwissClinicalTrialOrganisation / secuTrialR

Handling of data from the clinical data management system secuTrial
https://swissclinicaltrialorganisation.github.io/secuTrialR/
Other
8 stars 12 forks source link

assess_form_variable_completeness counts wrong missing, if same variable name exists in different forms #238

Closed nicoleb7 closed 2 years ago

nicoleb7 commented 2 years ago

Describe the bug The function 'assess_form_variable_completeness' sometimes calculates a larger number of 'missing' than there are actually missing - only inspected for completeness = "allforms" In the wrongly calculated cases, the total of 'timesentered' and 'timesmissing' is greater than the number of forms. I looked at the code. I think the problem arises if:

Solution suggestion: the chunk in the function 'assess_form_variable_completeness'

 # validation overview as table
    validation_overview_table <- table(validation_overview$Column)

should be reduced to

The current problem The chunk

# missing count per variable
      table_missing_counts [names (validation_overview_table)] <-
        table_missing_counts [names (validation_overview_table)] +
        validation_overview_table [names (validation_overview_table)]

calculates all entries with the same variable name, even those not connected to the variable name in the form and adds up rule violations for the same variable for the same patient in the same form and also add rule violations not concerning the completeness of the variable (i.e. age is not expected range)

To Reproduce Steps to reproduce the behavior:

  1. export of secuTrial eCRF with identical variable names (test1) in different forms (i.e. form1 and form2), in both forms some of these variables are missing, there are multiple rules applied for variable 'test1' and/or 'missing values' are activated

  2. assess_form_variable_completeness <- function(form1, casenodes_table, validation_overview, completeness = "allforms", occ_in_vp = 1, omit_mnp = TRUE)

Expected behavior per patient_id and form a variable should only be counted once

aghaynes commented 2 years ago

Thanks for pointing this out. As you happen to have a dataset that causes the problem, would you be interested in trying to amend the code?

As a side note... I would say that it is bad practice to have variables in different forms with the same name... With secuTrialR, this will cause another issue, besides the one you mentioned - the labels of the duplicate variable names will be of length 2 rather than 1...