eddelbuettel / digest

R package to create compact hash digests of R objects
https://eddelbuettel.github.io/digest
111 stars 47 forks source link

Add support for crc32c (closes #183) #184

Closed eddelbuettel closed 1 year ago

eddelbuettel commented 1 year ago

This PR adds support for crc32c by bringing in a cpu-only, non-hardware accelerated version from Google's repo.

@Dyfanjones: I cannot formally add you for a code review but maybe you can "informally" bless this give that you already ran the branch (from one commit ago)....

(The one 'fail' flagged under CI came from codecov where we have defaults 'for acceptable change' that are too stringent. Codecov should really be quiet on PRs so that is on me for keeping the repo setup in 'sane' order.)

DyfanJones commented 1 year ago

Hi @eddelbuettel, this looks all good to me.

Here are some extra checks with the python package awscrt.

checksum <- reticulate::import("awscrt.checksums", convert = F)
builtins <- reticulate::import("builtins")

crc32c <- function(string) {
  bs <- checksum$crc32c(string)
  #   convert hex is R string
  hex_bs <- builtins$hex(bs) |> as.character()
  out <- substr(hex_bs, 3, nchar(hex_bs))
  # ensure 8 character long
  return(paste0(paste(rep("0", 8-nchar(out)), collapse = ""), out, collapse = ""))
}

strings <- lapply(1:1000, function(x) {
  paste(sample(c(letters,LETTERS, "//", "[", "(", "{", "?", "^", "$", " ", "="), size = 20, replace = T), collapse="")
})

result <- vapply(strings, function(string){
  identical(crc32c(string), digest::digest(string, "crc32c", serialize = F))
}, FUN.VALUE = logical(1))

all(result)
#> [1] TRUE

Created on 2023-04-30 with reprex v2.0.2

eddelbuettel commented 1 year ago

I just added support for vectorized access to crc32c (and blake3) too. This should be ready.

And I think I will keep a mental note of maybe packaging all of the crc32c package for CRAN and then call that from the digest package (as a Suggests: as I would want to keep the 'no dependencies' rule). That way we would get the already-established platform tests and switches to hardware acceleration where available.