SwissClinicalTrialOrganisation / secuTrialR

Handling of data from the clinical data management system secuTrial
https://swissclinicaltrialorganisation.github.io/secuTrialR/
Other
8 stars 12 forks source link

Factorization/Dates problem for repetition tables in exports with short names #66

Closed markomi closed 5 years ago

markomi commented 5 years ago

Describe the bug Factor columns are not converted correctly in repetition tables of exports with short table names. They are fine for long names. Also short names in normal (non-repetition) tables are fine.

To Reproduce Steps to reproduce the behavior:

  1. Load an export with export-options$short_names == TRUE containing a repetition table (@PatrickRWright will provide an example soon) using read_secuTrial().
  2. have a look at any emnp repetition table with factors. the .factor column will contain only NA

Expected behavior .factor column would contain factor values as coded in cl table

PatrickRWright commented 5 years ago

@aghaynes if you load the below data you will see that the factorization is not working for in esurgeries. I have pushed the appropriate export to the data archive.

> d <- read_secuTrial("s_export_CSV-xls_CTU05_shortnames_sep_ref.zip")
> d$esurgeries[c(12, 13),]
          pat_id                   centre mnppid mnpdocid mnpsubdocid   fgid position surgery_type surgery_type.factor surgery_organ
12 RPACK-INS-012 Inselspital Bern (RPACK)   1210      234         235 120011        1            1                <NA>             1
13 RPACK-INS-012 Inselspital Bern (RPACK)   1210      234         233 120011        0            1                <NA>            11
   surgery_organ.factor
12                 <NA>
13                 <NA>
aghaynes commented 5 years ago

ah... those horrible forms... that would be because the metadata doesn't contain the "e" part. This is probably also the case for dates...?

PatrickRWright commented 5 years ago

I just checked for CTU05 and unfortunately there is not date variable in esurgeries. Do you have a local dataset that you can use to check?

markomi commented 5 years ago

I think @aghaynes is right about the dates. I just checked with a local dataset. They are not transformed from int to Date format.

PatrickRWright commented 5 years ago

Thank you for checking.

PatrickRWright commented 5 years ago

Basically a write up to myself.

Just to add to this since I looked a little closer into this issue:

> names(sT_export_short)
 [1] "export_options"  "fs"              "cn"              "ctr"             "is"              "qs"              "qac"            
 [8] "vp"              "vpfs"            "atcn"            "atcvp"           "cts"             "miv"             "cl"             
[15] "atmiv"           "baseline"        "atbaseline"      "outcome"         "atoutcome"       "treatment"       "attreatment"    
[22] "allmedi"         "atallmedi"       "studyterminat"   "atstudyterminat" "ae"              "atae"            "sae"            
[29] "atsae"           "esurgeries"      "atesurgeries"    "atae1"          
> names(sT_export_long)
 [1] "export_options"          "forms"                   "casenodes"               "centres"                 "items"                  
 [6] "questions"               "queries"                 "visitplan"               "visitplanforms"          "atcasenodes"            
[11] "atcasevisitplans"        "comments"                "miv"                     "cl"                      "atmiv"                  
[16] "ctu05baseline"           "atmnpctu05baseline"      "ctu05outcome"            "atmnpctu05outcome"       "ctu05treatment"         
[21] "atmnpctu05treatment"     "ctu05allmedi"            "atmnpctu05allmedi"       "ctu05studyterminat"      "atmnpctu05studyterminat"
[26] "ctu05ae"                 "atmnpctu05ae"            "ctu05sae"                "atmnpctu05sae"           "emnpctu05surgeries"     
[31] "atemnpctu05surgeries"    "atadverseevents"

The grep statement below fails for the short names:

grepl(paste0(form, ".", name, "$"), cl$column)

Looking at the regex and the cl refernces makes clear why:

> paste0(form, ".", name, "$")
[1] "esurgeries.surgery_type$"

> sT_export_short$cl[grep("surger", sT_export_short$cl$column),]
                             column code                                  value
4  emnpctu05surgeries.surgery_organ   10 Intraabdominal / intrathoracic vessels
5  emnpctu05surgeries.surgery_organ    8                                  Heart
6  emnpctu05surgeries.surgery_organ    7                 Kidney / Urinary tract
7  emnpctu05surgeries.surgery_organ    6                  Liver / Biliary tract
8  emnpctu05surgeries.surgery_organ    1                                Stomach
9  emnpctu05surgeries.surgery_organ    9                                   Lung
10 emnpctu05surgeries.surgery_organ    4     Intestines (recurrent perforation)
11 emnpctu05surgeries.surgery_organ    3                             Intestines
12 emnpctu05surgeries.surgery_organ    2        Stomach (recurrent perforation)
13 emnpctu05surgeries.surgery_organ    5                               Pancreas
14 emnpctu05surgeries.surgery_organ   11                                  Other
15  emnpctu05surgeries.surgery_type    3                           Re-operation
16  emnpctu05surgeries.surgery_type    2                              Emergency
17  emnpctu05surgeries.surgery_type    1                               Elective
97         mnpctu05baseline.surgery    0                                     no
98         mnpctu05baseline.surgery    1                                    yes
99         mnpctu05baseline.surgery   98                                unknown

> sT_export_long$cl[grep("surger", sT_export_long$cl$column),]
                             column code                                  value
4  emnpctu05surgeries.surgery_organ   10 Intraabdominal / intrathoracic vessels
5  emnpctu05surgeries.surgery_organ    8                                  Heart
6  emnpctu05surgeries.surgery_organ    7                 Kidney / Urinary tract
7  emnpctu05surgeries.surgery_organ    6                  Liver / Biliary tract
8  emnpctu05surgeries.surgery_organ    1                                Stomach
9  emnpctu05surgeries.surgery_organ    9                                   Lung
10 emnpctu05surgeries.surgery_organ    4     Intestines (recurrent perforation)
11 emnpctu05surgeries.surgery_organ    3                             Intestines
12 emnpctu05surgeries.surgery_organ    2        Stomach (recurrent perforation)
13 emnpctu05surgeries.surgery_organ    5                               Pancreas
14 emnpctu05surgeries.surgery_organ   11                                  Other
15  emnpctu05surgeries.surgery_type    3                           Re-operation
16  emnpctu05surgeries.surgery_type    2                              Emergency
17  emnpctu05surgeries.surgery_type    1                               Elective
97         mnpctu05baseline.surgery    0                                     no
98         mnpctu05baseline.surgery    1                                    yes
99         mnpctu05baseline.surgery   98                                unknown

Also note that the cl tables for both long and short exports are the same, which is where the problem is coming from.