eddelbuettel / digest

R package to create compact hash digests of R objects
https://eddelbuettel.github.io/digest
110 stars 47 forks source link

digest refactoring for performance #212

Open pearsonca opened 3 weeks ago

pearsonca commented 3 weeks ago

While working #210, I noticed opportunities for opportunites for consolidating / refactoring code in digest.c.

In some sense, digest.c is providing a Facade around interacting with the various quirks of the imported hashing libraries. It does some setup / reinterpretation of the input from R, uses a series of cases to handle each of the different algorithms, and then returns a consistent output.

It seems over time those cases have been internally implemented in slightly different ways (and I imagine, the hasher interfaces have changed over time after initial implementation). There is now a fair bit of repetitious code and/or inconsistent naming/etc that could be consolidated. This would make the assorted "Consider X" issues smaller lifts to accomplish, present opportunities for a bit of unit testing on the C side, benchmarking, etc - all the normal benefits of DRY.

I propose the following refactoring objectives:

Thoughts? Additional objectives? I think some of these (and possibly additional targets added via discussion) are entangled, but there's some potential to isolate at least some of these as their own PRs - preferences one way or another?

eddelbuettel commented 3 weeks ago

I am in favour of 1).

I am a little scared / see more downside than upside for 2) and 3) --- things work as they are, we manage to add new hasher so why rock the boat

I have so far refrained from larger-scale "style" only overhauls. I like when git blame refers to original contributions. [ I left your two-spaces indented lines alone too. ] I guess on the margin in favour but let's not get overboard.

Too little to keep you excited?

pearsonca commented 3 weeks ago

Can do a PR for just converting macros => functions, sure.

Then maybe draft something for one of the others, and see how it looks w/o (initially) super polishing it? If it seems promising, great, proceed to polish; if not, can always toss it.

pearsonca commented 2 weeks ago

Porting discussion from closed PRs #215 / #218 here:

What could potentially work for that second item:

aside: the longer term approach would be advantageous for enabling people to implement their class-specific hashing extensions ala #23 and for potentially having a digestext or similar package, that offers custom class-specific extensions without bloating the core library.