Closed markomi closed 5 years ago
@aghaynes if you load the below data you will see that the factorization is not working for in esurgeries. I have pushed the appropriate export to the data archive.
> d <- read_secuTrial("s_export_CSV-xls_CTU05_shortnames_sep_ref.zip")
> d$esurgeries[c(12, 13),]
pat_id centre mnppid mnpdocid mnpsubdocid fgid position surgery_type surgery_type.factor surgery_organ
12 RPACK-INS-012 Inselspital Bern (RPACK) 1210 234 235 120011 1 1 <NA> 1
13 RPACK-INS-012 Inselspital Bern (RPACK) 1210 234 233 120011 0 1 <NA> 11
surgery_organ.factor
12 <NA>
13 <NA>
ah... those horrible forms... that would be because the metadata doesn't contain the "e" part. This is probably also the case for dates...?
I just checked for CTU05 and unfortunately there is not date variable in esurgeries. Do you have a local dataset that you can use to check?
I think @aghaynes is right about the dates. I just checked with a local dataset. They are not transformed from int to Date format.
Thank you for checking.
Basically a write up to myself.
Just to add to this since I looked a little closer into this issue:
> names(sT_export_short)
[1] "export_options" "fs" "cn" "ctr" "is" "qs" "qac"
[8] "vp" "vpfs" "atcn" "atcvp" "cts" "miv" "cl"
[15] "atmiv" "baseline" "atbaseline" "outcome" "atoutcome" "treatment" "attreatment"
[22] "allmedi" "atallmedi" "studyterminat" "atstudyterminat" "ae" "atae" "sae"
[29] "atsae" "esurgeries" "atesurgeries" "atae1"
> names(sT_export_long)
[1] "export_options" "forms" "casenodes" "centres" "items"
[6] "questions" "queries" "visitplan" "visitplanforms" "atcasenodes"
[11] "atcasevisitplans" "comments" "miv" "cl" "atmiv"
[16] "ctu05baseline" "atmnpctu05baseline" "ctu05outcome" "atmnpctu05outcome" "ctu05treatment"
[21] "atmnpctu05treatment" "ctu05allmedi" "atmnpctu05allmedi" "ctu05studyterminat" "atmnpctu05studyterminat"
[26] "ctu05ae" "atmnpctu05ae" "ctu05sae" "atmnpctu05sae" "emnpctu05surgeries"
[31] "atemnpctu05surgeries" "atadverseevents"
The grep statement below fails for the short names:
grepl(paste0(form, ".", name, "$"), cl$column)
Looking at the regex and the cl refernces makes clear why:
> paste0(form, ".", name, "$")
[1] "esurgeries.surgery_type$"
> sT_export_short$cl[grep("surger", sT_export_short$cl$column),]
column code value
4 emnpctu05surgeries.surgery_organ 10 Intraabdominal / intrathoracic vessels
5 emnpctu05surgeries.surgery_organ 8 Heart
6 emnpctu05surgeries.surgery_organ 7 Kidney / Urinary tract
7 emnpctu05surgeries.surgery_organ 6 Liver / Biliary tract
8 emnpctu05surgeries.surgery_organ 1 Stomach
9 emnpctu05surgeries.surgery_organ 9 Lung
10 emnpctu05surgeries.surgery_organ 4 Intestines (recurrent perforation)
11 emnpctu05surgeries.surgery_organ 3 Intestines
12 emnpctu05surgeries.surgery_organ 2 Stomach (recurrent perforation)
13 emnpctu05surgeries.surgery_organ 5 Pancreas
14 emnpctu05surgeries.surgery_organ 11 Other
15 emnpctu05surgeries.surgery_type 3 Re-operation
16 emnpctu05surgeries.surgery_type 2 Emergency
17 emnpctu05surgeries.surgery_type 1 Elective
97 mnpctu05baseline.surgery 0 no
98 mnpctu05baseline.surgery 1 yes
99 mnpctu05baseline.surgery 98 unknown
> sT_export_long$cl[grep("surger", sT_export_long$cl$column),]
column code value
4 emnpctu05surgeries.surgery_organ 10 Intraabdominal / intrathoracic vessels
5 emnpctu05surgeries.surgery_organ 8 Heart
6 emnpctu05surgeries.surgery_organ 7 Kidney / Urinary tract
7 emnpctu05surgeries.surgery_organ 6 Liver / Biliary tract
8 emnpctu05surgeries.surgery_organ 1 Stomach
9 emnpctu05surgeries.surgery_organ 9 Lung
10 emnpctu05surgeries.surgery_organ 4 Intestines (recurrent perforation)
11 emnpctu05surgeries.surgery_organ 3 Intestines
12 emnpctu05surgeries.surgery_organ 2 Stomach (recurrent perforation)
13 emnpctu05surgeries.surgery_organ 5 Pancreas
14 emnpctu05surgeries.surgery_organ 11 Other
15 emnpctu05surgeries.surgery_type 3 Re-operation
16 emnpctu05surgeries.surgery_type 2 Emergency
17 emnpctu05surgeries.surgery_type 1 Elective
97 mnpctu05baseline.surgery 0 no
98 mnpctu05baseline.surgery 1 yes
99 mnpctu05baseline.surgery 98 unknown
Also note that the cl tables for both long and short exports are the same, which is where the problem is coming from.
Describe the bug Factor columns are not converted correctly in repetition tables of exports with short table names. They are fine for long names. Also short names in normal (non-repetition) tables are fine.
To Reproduce Steps to reproduce the behavior:
Expected behavior .factor column would contain factor values as coded in cl table