Closed pearsonca closed 2 months ago
profvis::profvis(replicate(1000, digest("8675309", algo = "md5", serialize = FALSE, raw = TRUE)))
indicated 4 hot spots: (1) converting algo, (2) converting errormode, (3) converting length, and (4) running the algorithm. This addresses (1)
(also, deferring the ChangeLog update to (3)?)
Let's please get back down to the convention that every PR has a ChangeLog entry. Thank you.
Alright, having sorted out some what's installed where, what session are we in issues:
randomwords <- sample(c(LETTERS, letters, 0:9), 1e6, replace = TRUE)
randomword <- paste0(randomwords, collapse = "")
profvis::profvis(lapply(randomwords, \(w) digest(w, algo = "md5", serialize = FALSE, raw = TRUE)))
profvis::profvis(replicate(1e3, digest(randomword, algo = "md5", serialize = FALSE, raw = TRUE)))
The second one (a lot of really little hashes) is interesting (and this is with this revision; prior to this, algo-related lines totalled up to ~what's seen here for errormode)
The first one (a few big hashes), not really. the line translating errormode does actually show up, but corresponds to <1% of the sampled run time.
Sorry, been tied up in other things. Now, I may need to mull this over a little bit more but I have the feeling that we are making 'mostly light and no heat here'. I am not convinced yet we need a change for performance: I can live with time spent on match.arg()
in a perfectly valid and common code idiom. Also, as noticed, at least one of the suggested changes reduces code clarity in my eyes.
Sorry, been tied up in other things. Now, I may need to mull this over a little bit more but I have the feeling that we are making 'mostly light and no hear here'. I am not convinced yet we need a change for performance: I can live with time spent on
match.arg()
in a perfectly valid and common code idiom. Also, as noticed, at least one of the suggested changes reduces code clarity in my eyes.
I'll standby for further discussion then. From my end, for {digest}
to work for my case it needs to be much faster on the small-hashes (and that seems to require mostly work on the R overhead), but there are other approaches I can use (e.g. just wrapping xxh in that package).
BREAK
I do think the errormode version of solving the performance problem could just be pushing the match.arg into the error checking function, without loss of code clarity. I can do that as a PR, if you like.
and that seems to require mostly work on the R overhead
I said that earlier in passing: maybe it is worth trying an alternate, no-frills accessor.
How the vectorised access was added was quite nice, maybe we could think of something similar here without requiring baby and bathwater approaches.
I do think the errormode version of solving the performance problem could just be pushing the match.arg into the error checking function, without loss of code clarity. I can do that as a PR, if you like.
That has merit too. We would then not spend those microseconds checking an argument we do not use.
In short, let's try to refocus on a few, clear, simple, unambiguous improvements.
Oh and another approach we could consider (and something I now know better than when this started) is to export C-level accessors directly to other packages. Even less overhead.
Alright, initiated an trim to the errormode checking, since that's just-better.
Re no frills: how about an xxh(...) function along the lines of the sha()
functionality? The potential problem there is that all the sha1(...)
functions ultimately pass to digest, and this would be a different approach.
Extracted from #215