ccb-hms / nhanes-database

3 stars 4 forks source link

Translated tables convert numeric to character if codebook is missing #171

Closed deepayan closed 6 months ago

deepayan commented 6 months ago

Example:

> phonto::nhanesQuery("select SEQN, BPXSAR, BPXDAR from Raw.BPX_C order by SEQN") |> str()
'data.frame':   9643 obs. of  3 variables:
 $ SEQN  : int  21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ...
 $ BPXSAR: num  NA 98 96 104 118 136 NA 121 108 NA ...
 $ BPXDAR: num  NA 50 62 74 85 83 NA 65 67 NA ...
> phonto::nhanesQuery("select SEQN, BPXSAR, BPXDAR from Translated.BPX_C order by SEQN") |> str()
'data.frame':   9643 obs. of  3 variables:
 $ SEQN  : int  21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ...
 $ BPXSAR: chr  NA "98" "96" "104" ...
 $ BPXDAR: chr  NA "50" "62" "74" ...

These two variables are missing from the codebook. We have normally no way of knowing whether the variable is numeric (in this case, we can check other cycles), but it's probably better to keep such variables numeric by default.

Other examples (not exhaustive) are HPVSWR_F, OHXPRL_B, OHXPRU_B

This came up as one source of mismatch in the R vs DB translations.

nathan-palmer commented 6 months ago

Changing translation process to leverage NHANESA.