bcgov / wqbc

An R package for water quality thresholds and index calculation for British Columbia
http://bcgov.github.io/wqbc/
Apache License 2.0
19 stars 9 forks source link

wqbc incorrectly assumes that MDL_UNIT is the same as UNIT in rems data #158

Closed joethorley closed 3 years ago

joethorley commented 3 years ago

wqbc currently assumes that MDL_UNIT is UNIT in rems data but this is not always the case. Consider selenium concentrations on the elk river.

There are two possible fixes.

The first is to alter the rems database so that the two are always the same.

The second is to update wqbc to require both and adjust independently. This is quite a lot of work to ensure the unit types are consistent and convertible and to perform the conversion and to provide informative error messages when this is not the case.

ateucher commented 3 years ago

So we have confirmation that this can't be altered in the database at this time, so the fix will have to happen here, or in rems. Here's a table of all the non-matching units in the two columns - we'll have to manually populate a third column with a conversion factor:

library(rems)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

con <- connect_historic_db()
#> Please remember to use 'disconnect_historic_db()' when you are finished querying the historic database.
hist_tbl <- attach_historic_data(con)

unit_tbl_hist <- hist_tbl %>% 
  select(UNIT, MDL_UNIT) %>% 
  filter(UNIT != MDL_UNIT) %>% 
  distinct() %>% 
  collect()

unit_tbl_2yr <- get_ems_data("2yr") %>% 
  select(UNIT, MDL_UNIT) %>% 
  filter(UNIT != MDL_UNIT) %>% 
  distinct()
#> Fetching data from cache...

unit_tbl <- bind_rows(unit_tbl_hist, unit_tbl_2yr) %>% 
  distinct()

knitr::kable(unit_tbl, "simple")
UNIT MDL_UNIT
ug/m3 mg/m3
m mm
g/m2 ug/cm2
mg/L ug/L
ug/g wet ug/g
mg/dm2/d ug/cm2/d
ppm A mg/m3
uS/cm mho/cm
ug/g mg/L
t/d kg/d
mg/L ppm W
m3/d E6IG/d
m3/d USG/d
m3/d E3m3/d
ug/g kg/t
m3/d L/d
m3/s W m3/d
adt/d t/d
m3/d IG/d
uS/cm umho/cm
%(W/W) %
mg/dm2/d ug/dm2/m
ug/g mg/kg
mg/dm2/d ug/dm2/d
mg/L None
mg/L Vehicles
m3/s W USG/min
m3/d E6L/d
CFU/100mL L/min
m3/s W m3/min
mg/L %
ueq/L mg/L
ug/g %
ug/g wet mg/kg
m3/d m3/min W
m3/d m3/s W
ug/g ug/L
ug/g wet mg/kg wet
mg/L ng/L
mg/L mg/dm2/d
m3/s W L/min
mg/dm2/d mg/dm2/m
m3/d m3/w
C F
m3/min A m3/s A
m3/s W IG/d
mg/L ug/cm2
ug/g ug/g wet
AU/cm mg/L
mg/L mg
mg/L g/L
ug/m3 ug
ug/g mg/kg wet
% (Mortality) %
kPa mm Hg
MPN/g MPN/100gm
mg/L mg/kg
m3/s W L/s
None N/A
mg/L % Saturation
NTU JTU
m3/min A m3/s W
MPN/100gm F
% % (Mortality)
CFU/100mL mg/L
m3/d m3/m
ppt ug/kg
C h
% (Mortality) %(V/V)
mg/L ug
mg/L mg/m2
m3/d mV
% %(W/W)
mg/L m3/d
mg/L ug/g
g/m2 mg/m2
pH units Col. Unit
m3/d m3/y
ug/g wet mg/L
m3/d uS/cm
kg/adt mg/L
mg/dm2/d mg/L
m3/s W m3/h
ug/m3 t/d
MPN/100mL F
%(V/V) %
None uS/cm
m ft
t/d mg/L
mg/L ml/L
kg/adt % (Recovery)
%(V/V) % (Mortality)
CFU/100mL MPN/100mL
MPN/100mL mg/L
mg/L N/A
m3/d mg/L
%(W/W) mg/L
MPN/100mL ug/g
pH units mg/g
C N/A
g/m2 ug/L
ug/g kg/adut
t/d adt/d
uS/cm mg/L
Col. Unit pH units
m m3/s W
ug/g None
ug/m3 mg/L
pH units None
Col. Unit SWU
kg/adt kg/d
mg/L uS/cm
ug/m3 mg/dm2/d
mg/L MPN/100mL
%(W/W) % (Mortality)
mg/L C
C mg/L
adt/d Col. Unit
C pH units
pH units mg/L
t/d Ton/d
mg/L kg/d
MPN/100mL CFU/100mL
ml/L ug/L
m N/A
pH units uS/cm
% ppm (S)

Created on 2021-02-09 by the reprex package (v1.0.0)

joethorley commented 3 years ago

wqbc has a convert_values() function which recognizes quite a few units and may deal with most of cases - it would be useful to add a count of the frequency of those various unit combinations (for which both values are provided) to the table above for scoping the magnitude of the problem - a lot are not-transferable and must be errors ie C into pH Units.

https://rdrr.io/github/bcgov/wqbc/man/convert_values.html

ateucher commented 3 years ago

Good point. There are definitely some that are nonsensical (mg/L <=> Vehicles??). Here they are with counts:

library(rems)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

con <- connect_historic_db()
#> Please remember to use 'disconnect_historic_db()' when you are finished querying the historic database.
hist_tbl <- attach_historic_data(con)

unit_tbl_hist <- hist_tbl %>% 
  filter(UNIT != MDL_UNIT) %>% 
  count(UNIT, MDL_UNIT) %>% 
  collect()

unit_tbl_2yr <- get_ems_data("2yr", dont_update = TRUE) %>% 
  filter(UNIT != MDL_UNIT) %>% 
  count(UNIT, MDL_UNIT)
#> Fetching data from cache...

unit_tbl <- bind_rows(unit_tbl_hist, unit_tbl_2yr) %>% 
  group_by(UNIT, MDL_UNIT) %>% 
  summarise(n = sum(n)) %>% 
  arrange(-n)
#> `summarise()` has grouped output by 'UNIT'. You can override using the `.groups` argument.

knitr::kable(unit_tbl, "simple")
UNIT MDL_UNIT n
mg/L ug/L 1751998
ug/m3 mg/m3 118938
t/d kg/d 78182
m3/s W USG/min 68540
m3/d E3m3/d 35648
ug/g mg/kg 23386
m3/d E6L/d 11060
ug/g wet mg/kg 9743
m mm 6166
m3/d IG/d 5529
g/m2 ug/cm2 4661
m3/s W L/min 3817
ug/g ug/L 3187
m3/d USG/d 2830
ueq/L mg/L 2745
CFU/100mL L/min 1819
ug/g wet mg/kg wet 1731
m3/d L/d 1585
ppm A mg/m3 1484
mg/L mg/dm2/d 1443
ug/g wet ug/g 1070
m3/s W m3/min 1038
uS/cm umho/cm 977
m3/s W m3/d 768
%(W/W) % 690
m3/d m3/min W 684
ug/g mg/L 582
uS/cm mho/cm 499
mg/L ppm W 423
mg/dm2/d ug/dm2/d 345
mg/dm2/d ug/dm2/m 343
ug/g % 330
m3/d m3/s W 315
mg/L ug/cm2 230
ug/g kg/t 230
ug/m3 ug 223
m3/s W IG/d 219
mg/dm2/d ug/cm2/d 203
adt/d t/d 190
mg/L % 186
m3/d E6IG/d 176
mg/L None 166
mg/dm2/d mg/dm2/m 155
ug/g ug/g wet 151
m3/s W L/s 130
mg/L ng/L 106
m3/min A m3/s A 99
kPa mm Hg 84
mg/L mg 74
mg/L g/L 70
m3/d m3/w 60
None N/A 60
AU/cm mg/L 57
g/m2 mg/m2 57
% %(W/W) 40
C F 31
C h 31
mg/L ug/g 31
m ft 29
ppt ug/kg 28
CFU/100mL mg/L 27
t/d mg/L 25
MPN/g MPN/100gm 24
ug/g mg/kg wet 24
mg/L mg/m2 22
mg/L mg/kg 21
CFU/100mL MPN/100mL 20
m3/min A m3/s W 18
mg/L % Saturation 18
% (Mortality) %(V/V) 16
mg/L Vehicles 15
NTU JTU 15
pH units Col. Unit 14
m3/d uS/cm 13
m3/d m3/y 12
m3/s W m3/h 11
mg/L ug 11
mg/L m3/d 9
% % (Mortality) 8
g/m2 ug/L 8
MPN/100mL mg/L 8
ug/g kg/adut 8
ug/g wet mg/L 8
%(V/V) % (Mortality) 7
ug/g None 7
%(V/V) % 6
m3/d m3/m 6
kg/adt mg/L 5
mg/dm2/d mg/L 5
pH units mg/L 5
ug/m3 mg/dm2/d 5
ug/m3 t/d 5
uS/cm mg/L 5
C N/A 4
m m3/s W 4
m3/d mV 4
mg/L N/A 4
None uS/cm 4
% ppm (S) 3
MPN/100mL ug/g 3
% (Mortality) % 2
%(W/W) mg/L 2
C mg/L 2
Col. Unit SWU 2
mg/L MPN/100mL 2
mg/L uS/cm 2
MPN/100mL CFU/100mL 2
pH units mg/g 2
pH units None 2
%(W/W) % (Mortality) 1
adt/d Col. Unit 1
C pH units 1
Col. Unit pH units 1
kg/adt % (Recovery) 1
kg/adt kg/d 1
m N/A 1
m3/d mg/L 1
mg/L C 1
mg/L kg/d 1
mg/L ml/L 1
ml/L ug/L 1
MPN/100gm F 1
MPN/100mL F 1
pH units uS/cm 1
t/d adt/d 1
t/d Ton/d 1
ug/m3 mg/L 1

Created on 2021-02-10 by the reprex package (v1.0.0)

joethorley commented 3 years ago

Units that are currently recognised by wqbc and can be used for permitted conversions:

c("ng/L", "ug/L", "mg/L", "g/L", "kg/L", "pH", "degC", "C", "CFU/dL", "MPN/dL", "CFU/100mL", "MPN/100mL", "CFU/g", "MPN/g", "CFU/mL", "MPN/mL", "Col.unit", "Rel", "NTU", "m", "uS/cm")

Most cases are dealt with by mg/L and ug/L.

ateucher commented 3 years ago

In terms of internal package structure and future extensibility (i.e., adding more conversions as needed), would it be more efficient to use a lookup table (i.e., above, with a column of conversion factors)?

joethorley commented 3 years ago

I think it would be more efficient to expand the units recognized by wqbc - the code to do the conversions is relatively elegant.

All we have to do is to add them to lookup_units(), get_unit_multiplier () and get_unit_type().

convert_values <- function(x, from, to, messages) {
  from <- substitute_units(from, messages = messages)
  to <- substitute_units(to, messages = messages)

  x <- x * get_unit_multiplier(from) / get_unit_multiplier(to)

  bol <- from != to & get_unit_type(from) != get_unit_type(to)
  bol <- is.na(bol) | bol

  if (any(bol)) {
    warning(sum(bol), " values have inconvertible units")
    is.na(x[bol]) <- TRUE
  }
  x
}
lookup_units <- function() {
  c(
    "ng/L", "ug/L", "mg/L", "g/L", "kg/L", "pH", "degC", "C",
    "CFU/dL", "MPN/dL", "CFU/100mL", "MPN/100mL", "CFU/g", "MPN/g", "CFU/mL", "MPN/mL",
    "Col.unit", "Rel", "NTU", "m", "uS/cm"
  )
}
get_unit_multiplier <- function(x) {
  units <- c(
    "ng/L" = 10^-9, "ug/L" = 10^-6, "mg/L" = 10^-3,
    "g/L" = 1, "kg/L" = 10^3,
    "pH" = 1,
    "degC" = 1, "C" = 1, "NTU" = 1,
    "Col.unit" = 1, "Rel" = 1,
    "CFU/dL" = 1, "CFU/100mL" = 1, "CFU/mL" = 0.01, "MPN/dL" = 1,
    "MPN/100mL" = 1, "MPN/mL" = 0.01, "CFU/g" = 0.01, "MPN/g" = 0.01,
    "m" = 1,
    "uS/cm" = 1
  )
  x <- units[x]
  names(x) <- NULL
  x
}
get_unit_type <- function(x) {
  type <- list(
    "concentration" = c("ng/L", "ug/L", "mg/L", "g/L", "kg/L"),
    "pH" = "pH",
    "Colour" = c("Col.unit", "Rel"),
    "Temperature" = c("degC", "C"),
    "Turbidity" = "NTU",
    "Coli" = c("CFU/dL", "CFU/100mL", "CFU/mL", "MPN/dL", "MPN/100mL", "MPN/mL", "CFU/g", "MPN/g"),
    "Distance" = "m",
    "Conductivity" = "uS/cm"
  )

  type <- unlist(type)
  names <- sub("\\d$", "", names(type))
  values <- type
  type <- names
  names(type) <- values

  x <- type[x]
  names(x) <- NULL
  x
}
ateucher commented 3 years ago

I was looking at it, and I agree it's very elegant! One thing I wonder is if there are units within a unit type that are incompatible, and how we would treat those

joethorley commented 3 years ago

Can you provide an example?

ateucher commented 3 years ago

I can't provide an example because I'm just not sure - I don't know what a lot of those units even are or what type of quantity they represent... @JessicaPenno @KarHarker can you provide input on what unit combinations in the above table are important and/or don't make sense?

One other question (which I haven't thought through, so this may be a silly question) - Using the multiplier approach currently implemented could get complicated when both the numerator and denominator are different (e.g., ug/g <=> mg/kg).

ateucher commented 3 years ago

I wonder if we can leverage the {units} package?

joethorley commented 3 years ago

{units} may be the way to go but I suspect it will not have all the units used by EMS ie (MPN) plus the multiplier approach should work with different numerators and denominators if the units are simple multiples of each other for example 1 ug/g = 1 mg/kg so in this case the relative multiplier is 1 (we would just need to hard code in all combinations of the numerators and denominators).

ateucher commented 3 years ago

It doesn't have all the units, but it's easy to define new units and set conversion factors. Here's some experimenting I did, I think it looks pretty good. It takes care of the vast majority of them, and the rest are either really specific (e.g. relying on the molar mass of the chemical etc) or errors/nonsensical:

library(rems)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(units)
#> udunits system database from /usr/local/share/udunits

install_symbolic_unit("MPN")
install_conversion_constant("MPN", "CFU", 1)
install_symbolic_unit("NTU")
install_conversion_constant("NTU", "JTU", 1)
install_conversion_constant("US_liquid_gallon", "USG", 1)
install_conversion_constant("UK_liquid_gallon", "IG", 1)
# These petroleum measures are actually only ever used
# as volumes (1m3 = 0.001 E3m3 = 1e-6), but can't use E3m3 
# to install the unit so have to do it as m, units takes 
# care of the rest when it is m3 <=> E3m3 etc.
install_conversion_constant("m", "E3m", .1)
install_conversion_constant("m", "E6m", .01)
install_conversion_constant("L", "E6L", 1e-6)
#> set_units(set_units(1, "m3"), "E3m3")
# 0.001 [E3m3]
#> set_units(set_units(1, "m3"), "E6m3")
# 1e-06 [E6m3]

convert_values2 <- function(x, from, to) {
  to <- clean_unit(to)
  from <- clean_unit(from)
  x <- units::set_units(x, from, mode = "standard")
  as.numeric(units::set_units(x, to, mode = "standard"))
}

clean_unit <- function(x) {
  # Remove trailing A, W, wet etc. as well as percent type (V/V, W/W, Moratlity)
  # Assuming they are not imporant in the unit conversion??
  gsub("\\s*(W|wet|A|\\(W/W\\)|\\(V/V\\)|\\(Mortality\\))\\s*$", "", x)
}

con <- connect_historic_db()
#> Please remember to use 'disconnect_historic_db()' when you are finished querying the historic database.
hist_tbl <- attach_historic_data(con)

unit_tbl_hist <- hist_tbl %>% 
  filter(UNIT != MDL_UNIT) %>% 
  count(UNIT, MDL_UNIT) %>% 
  collect()

unit_tbl_2yr <- get_ems_data("2yr", dont_update = TRUE) %>% 
  filter(UNIT != MDL_UNIT) %>% 
  count(UNIT, MDL_UNIT)
#> Fetching data from cache...

unit_tbl <- bind_rows(unit_tbl_hist, unit_tbl_2yr) %>% 
  group_by(UNIT, MDL_UNIT) %>% 
  summarise(n = sum(n)) %>% 
  arrange(-n)
#> `summarise()` has grouped output by 'UNIT'. You can override using the `.groups` argument.

unit_tbl$converted_from_1 <- NA_real_

for (i in seq_len(nrow(unit_tbl))) {
  unit_tbl$converted_from_1[i] <- tryCatch(
    convert_values2(1, unit_tbl$UNIT[i], unit_tbl$MDL_UNIT[i]), 
    error = function(e) NA_real_
  )
}
#> Warning: Could not parse expression: '`N`/'. Returning as a single symbolic
#> unit()

#> Warning: Could not parse expression: '`N`/'. Returning as a single symbolic
#> unit()

#> Warning: Could not parse expression: '`N`/'. Returning as a single symbolic
#> unit()

knitr::kable(arrange(unit_tbl, is.na(converted_from_1), desc(n)))
UNIT MDL_UNIT n converted_from_1
mg/L ug/L 1751998 1.000000e+03
ug/m3 mg/m3 118938 1.000000e-03
t/d kg/d 78182 1.000000e+03
m3/s W USG/min 68540 1.585032e+04
m3/d E3m3/d 35648 1.000000e-03
ug/g mg/kg 23386 1.000000e+00
m3/d E6L/d 11060 1.000000e-03
ug/g wet mg/kg 9743 1.000000e+00
m mm 6166 1.000000e+03
m3/d IG/d 5529 2.199692e+02
g/m2 ug/cm2 4661 1.000000e+02
m3/s W L/min 3817 6.000000e+04
m3/d USG/d 2830 2.641720e+02
ug/g wet mg/kg wet 1731 1.000000e+00
m3/d L/d 1585 1.000000e+03
ug/g wet ug/g 1070 1.000000e+00
m3/s W m3/min 1038 6.000000e+01
m3/s W m3/d 768 8.640000e+04
%(W/W) % 690 1.000000e+00
m3/d m3/min W 684 6.944000e-04
mg/dm2/d ug/dm2/d 345 1.000000e+03
ug/g % 330 1.000000e-04
m3/d m3/s W 315 1.160000e-05
ug/g kg/t 230 1.000000e-03
m3/s W IG/d 219 1.900534e+07
mg/dm2/d ug/cm2/d 203 1.000000e+01
ug/g ug/g wet 151 1.000000e+00
m3/s W L/s 130 1.000000e+03
mg/L ng/L 106 1.000000e+06
m3/min A m3/s A 99 1.666670e-02
kPa mm Hg 84 7.500616e+00
mg/L g/L 70 1.000000e-03
g/m2 mg/m2 57 1.000000e+03
% %(W/W) 40 1.000000e+00
m ft 29 3.280840e+00
ppt ug/kg 28 1.000000e-03
ug/g mg/kg wet 24 1.000000e+00
CFU/100mL MPN/100mL 20 1.000000e+00
m3/min A m3/s W 18 1.666670e-02
% (Mortality) %(V/V) 16 1.000000e+00
NTU JTU 15 1.000000e+00
m3/s W m3/h 11 3.600000e+03
% % (Mortality) 8 1.000000e+00
%(V/V) % (Mortality) 7 1.000000e+00
%(V/V) % 6 1.000000e+00
% (Mortality) % 2 1.000000e+00
MPN/100mL CFU/100mL 2 1.000000e+00
%(W/W) % (Mortality) 1 1.000000e+00
t/d Ton/d 1 1.102311e+00
ug/m3 mg/L 1 1.000000e-06
ug/g ug/L 3187 NA
ueq/L mg/L 2745 NA
CFU/100mL L/min 1819 NA
ppm A mg/m3 1484 NA
mg/L mg/dm2/d 1443 NA
uS/cm umho/cm 977 NA
ug/g mg/L 582 NA
uS/cm mho/cm 499 NA
mg/L ppm W 423 NA
mg/dm2/d ug/dm2/m 343 NA
mg/L ug/cm2 230 NA
ug/m3 ug 223 NA
adt/d t/d 190 NA
mg/L % 186 NA
m3/d E6IG/d 176 NA
mg/L None 166 NA
mg/dm2/d mg/dm2/m 155 NA
mg/L mg 74 NA
m3/d m3/w 60 NA
None N/A 60 NA
AU/cm mg/L 57 NA
C F 31 NA
C h 31 NA
mg/L ug/g 31 NA
CFU/100mL mg/L 27 NA
t/d mg/L 25 NA
MPN/g MPN/100gm 24 NA
mg/L mg/m2 22 NA
mg/L mg/kg 21 NA
mg/L % Saturation 18 NA
mg/L Vehicles 15 NA
pH units Col. Unit 14 NA
m3/d uS/cm 13 NA
m3/d m3/y 12 NA
mg/L ug 11 NA
mg/L m3/d 9 NA
g/m2 ug/L 8 NA
MPN/100mL mg/L 8 NA
ug/g kg/adut 8 NA
ug/g wet mg/L 8 NA
ug/g None 7 NA
m3/d m3/m 6 NA
kg/adt mg/L 5 NA
mg/dm2/d mg/L 5 NA
pH units mg/L 5 NA
ug/m3 mg/dm2/d 5 NA
ug/m3 t/d 5 NA
uS/cm mg/L 5 NA
C N/A 4 NA
m m3/s W 4 NA
m3/d mV 4 NA
mg/L N/A 4 NA
None uS/cm 4 NA
% ppm (S) 3 NA
MPN/100mL ug/g 3 NA
%(W/W) mg/L 2 NA
C mg/L 2 NA
Col. Unit SWU 2 NA
mg/L MPN/100mL 2 NA
mg/L uS/cm 2 NA
pH units mg/g 2 NA
pH units None 2 NA
adt/d Col. Unit 1 NA
C pH units 1 NA
Col. Unit pH units 1 NA
kg/adt % (Recovery) 1 NA
kg/adt kg/d 1 NA
m N/A 1 NA
m3/d mg/L 1 NA
mg/L C 1 NA
mg/L kg/d 1 NA
mg/L ml/L 1 NA
ml/L ug/L 1 NA
MPN/100gm F 1 NA
MPN/100mL F 1 NA
pH units uS/cm 1 NA
t/d adt/d 1 NA

Created on 2021-02-11 by the reprex package (v1.0.0)

joethorley commented 3 years ago

Wow I didn't know you could set units in {units} - then that is definitely the way forward - in fact it should be used to do all the unit handling - which begs the question should the unit handling reside in rems?

ateucher commented 3 years ago

It certainly can live in rems. I think it would have to be a utility function that the user calls rather than happening automatically, as there are ways that a user can pull the data into R where we wouldn't be able to intervene with the unit conversion (e.g., calling dbGetQuery on the sqlite database). This would result in different behaviour for different ways of accessing the data

joethorley commented 3 years ago

Good point - maybe then its conceptually more consistent with the various package philosophies if it resides in wqbc?

ateucher commented 3 years ago

I've implemented the basics in rems. Unfortunately it's not that fast - about 11 seconds on 5000 rows...

joethorley commented 3 years ago

That's too bad - could it be run as part of the database creation process which is lengthy already anyway?

ateucher commented 3 years ago

It could, since for the SQLite backend the data does traverse through R - though ~2M rows with inconsistent units will need to be fixed, and if that timing scales linearly it will take over an hour! Also, when DuckDB (which is looking more and more promising) matures I do intend to use it as a backend - and it will create the db directly from the csv, so any R processing is then not an option.

joethorley commented 3 years ago

We can see how long it takes within wqbc/shinyrems workflows.

ateucher commented 3 years ago

There is an unexported function in units - ud_convert() - which is way faster than the way I am doing it now:

library(microbenchmark)
library(units)
#> udunits system database from /usr/local/share/udunits

microbenchmark(
  units:::ud_convert(1, "mg/L", "ug/L"), 
  set_units(set_units(1, "mg/L"), "ug/L")
)
#> Unit: microseconds
#>                                     expr      min        lq       mean
#>    units:::ud_convert(1, "mg/L", "ug/L")   25.729   31.9175   37.05395
#>  set_units(set_units(1, "mg/L"), "ug/L") 1995.143 2111.7080 2388.55648
#>     median       uq      max neval
#>    35.0855   41.086  145.008   100
#>  2166.6645 2521.354 6220.956   100

Created on 2021-02-16 by the reprex package (v1.0.0)

I'll open an issue in units to see if Edzer would consider exporting it; otherwise in the meantime I'll use :::.

ateucher commented 3 years ago

https://github.com/r-quantities/units/issues/266

ateucher commented 3 years ago

Ok I think I have a pretty good solution at https://github.com/bcgov/rems/pull/57

joethorley commented 3 years ago

Is it fast enough to apply universally to all the data?

ateucher commented 3 years ago

It does 1M rows in < 1s, but I won't do it at the db creation stage due to the likely future duckdb migration I mentioned above.

joethorley commented 3 years ago

Confirm closed by https://github.com/bcgov/rems/releases/tag/v0.7.0