Closed graemegowans closed 4 years ago
Hi @graemegowans,
Thanks for the suggestions!
Apologies if I'm being ignorant, but does chilir::pad_chi()
do the same thing as stringr::str_pad()
?
stringr::str_pad(c(123456789, 0123456789), width = 10, pad = "0")
#> [1] "0123456789" "0123456789"
A function that does CHI validation is interesting, and I believe both @alan-y and @Moohan have written similar functions in the past. Could I suggest speaking with them first? I have no idea what the best way of doing it would be and I don't want to be biased towards anyone's particular method, so it would be great if you guys could try to reach some consensus prior to anything being contributed to phsmethods.
I've written a CHI check function but only in SPSS, I looked at @graemegowans chi_check() function (https://github.com/graemegowans/chilir/blob/master/R/chi_check.R) and it seems to do everything mine did i.e. test the check digit and also try to validate the date. I think the date check could be developed by adding something like is.na(lubridate::dmy(date_part) which would be shorter and also catch dates with months with fewer than 31 days.
Thanks guys! @jackhannah95 I'm sure I looked at the str_pad()
function when I was working on it but the problem was that it was padding anything shorter than 10 rather than just those that have lost the leading zero, which I thought might not be what people would want.
stringr::str_pad("123", 10, pad = "0")
[1] "0000000123"
Although this would be invalid anyway, so maybe not a problem. Not sure what the most appropriate behaviour would be?
Thanks @Moohan, that's a good idea, I can change that.
Ah I see. In that case I'd be quite happy to have a more specific function in phsmethods then that pads only 9 digit characters, as long as the documentation makes that restriction clear. I guess the two functions would have to work in tandem most of the time, i.e.
df %>%
mutate(chi = chi_pad(chi),
chi = chi_check(chi))
I would have "chi" be the first word in both function names and group them together in the same script with the same documentation, much like the qtr()
functions have been.
I think consideration would need to be given to how they fail as well. For chi_pad()
it would seem pretty obvious to leave any non-9 digit numbers as they are, but chi_check()
is trickier. I guess if I were using this on a large input vector I'd probably want it to flag the invalid ones in some way (maybe by converting them to NA
?) and leave the valid ones alone.
In the meantime while you're doing any mulling over chi_check()
, feel free to make a branch and add in chi_pad()
!
I think consideration would need to be given to how they fail as well. For
chi_pad()
it would seem pretty obvious to leave any non-9 digit numbers as they are, butchi_check()
is trickier. I guess if I were using this on a large input vector I'd probably want it to flag the invalid ones in some way (maybe by converting them toNA
?) and leave the valid ones alone.
I had a second function called is_valid_chi()
to flag these ones.
I think the date check could be developed by adding something like is.na(lubridate::dmy(date_part) which would be shorter and also catch dates with months with fewer than 31 days.
This is how I think I done it.
Thanks for all the feedback guys. My idea for the chi_check()
function was for it not to operate on the numbers themselves, but to just flag invalid ones (similar to what @alan-y said about is_valid_chi()
) - so something like:
df <- tibble(chi_no = c("0101209991", "3201209991", "0113209991", "3213209991",
"123", "12345678901", "?123456789", "A123456789", "0101209992"))
df %>% mutate(checked = chi_check(chi_no))
#> # A tibble: 9 x 2
#> chi_no checked
#> <chr> <chr>
#> 1 0101209991 <NA>
#> 2 3201209991 invalid day
#> 3 0113209991 invalid month
#> 4 3213209991 day/month invalid
#> 5 123 too short
#> 6 12345678901 too long
#> 7 ?123456789 invalid character
#> 8 A123456789 invalid character
#> 9 0101209992 invalid checksum
@graemegowans, it looks good. Remember to implement a check for leap years in the dates (otherwise we'll upset the Feb 29 babies - I know one!)
Looks good @graemegowans 👍 open a PR when you're ready and we'll do a review for you.
Hi @graemegowans. How are you getting on? Hoping to do another release of the package within the next 2 weeks. It'd be great to get these functions in if possible, but it's no problem if they have to wait until the next one.
Sorry @jackhannah95, was on leave! Everything is ready to go for review I think so I will make a request hopefully today.
Added in #31.
Hey guys - I have some simple functions that I wrote for working with CHI numbers. One takes a 9 character CHI and pads it to 10 with a leading zero and the other does some basic validity checks (checks for length, valid characters, valid checksum) and returns error messages. I wanted to share these with some others in the team (and I was interested in learning how to make a package) so I had wrapped these into a package already: https://github.com/graemegowans/chilir. But if these would be more broadly useful then happy to add here? Thanks!