bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

Add IterableMatrix MD5 checksum. #83

Closed brgew closed 7 months ago

brgew commented 7 months ago

Hi Ben,

I got around finally to adding an MD5 checksum calculation for an iterable matrix.

I want to thank you for your generous guidance of last November -- it made this addition fairly simple.

Some aspects of this addition include

I imagine that you will have reservations about some of this. I am keen to hear your feedback!

Ever grateful, Brent

bnprks commented 7 months ago

Hi Brent, apologies for the delayed response, I'm just getting back from some travel. Overall this looks good but I do have some notes as anticipated :)

Notes:

brgew commented 7 months ago

Hi Ben,

Thank you for the feedback! I made the changes that you requested to the best of my ability. I added some fiddly casting code to try to keep the number of copies and function calls to a near minimum -- I hope that this does not create problems in the future. I added also technical details and an example to the checksum() documentation. Please feel free to bounce it back to me if it doesn't meet your expectations.

Oh, I am not confident in my github skills: I made the changes in my account without making a new pull request or syncing. I see the changes when I look at the code on your account. I hope that you see them too.

Thank you for considering this function.

Ever grateful, Brent

bnprks commented 7 months ago

Hi Brent,

Thanks for making those adjustments. I've gone through a detailed review and made a couple edits to just tidy things up on my end, which were:

Thanks for submitting these changes, and I hope they're helpful for your downstream work!

One last pair of details I thought of that may or may not be important to you from the checksum perspective:

Given that initially you'll probably be the main/only user of this function I'm happy to include as much or as little as you like in the checksum value. From my perspective the code is ready to merge as-is now, but I'll wait to hear back from you on these two questions in case it's something you want to adjust before we publish this in the main branch.

I'd be happy to make the edits to include row/col names in the checksum if desired, otherwise the relevant C++ functios are rowNames , colNames, rows, and cols.

P.S. In terms of github workflow, all new commits you make on that same github branch show up to me, and by default I also have commit permissions on your branch to make edits. So a git pull on your branch should get you a copy of my edits and a git push publishes your edits to me.

brgew commented 7 months ago

Hi Ben,

Thank you for cleaning up after me!

My gut tells me it would be wise to heed your advice so I'll look at adding to the checksum calculation as you suggest.

I appreciate your patience and guidance, and I hope all goes well with you!

Ever grateful, Brent

brgew commented 7 months ago

Hi Ben,

Again, I appreciate your patience with me.

I added and pushed your suggested additions. Also, I looked at the checksum run time on a 32k x 2.7m sparse matrix, which was about 30 seconds and much better than I feared. I admire your code and coding judgement!

Thank you.

Ever grateful, Brent

bnprks commented 7 months ago

Hi Brent, thanks for making those additional changes -- looks great! I had also been concerned about speed initially, but similarly found fast performance (about ~2x slower than a simple colsum in my large matrix test).

I'll merge this in so it'll be available in the main branch

brgew commented 7 months ago

Hi Ben,

Thank you, thank you, thank you!

I appreciately greatly your patience, guidance, and skills.

Ever grateful, Brent

On Fri, Apr 19, 2024 at 1:27 PM Ben Parks @.***> wrote:

Hi Brent, thanks for making those additional changes -- looks great! I had also been concerned about speed initially, but similarly found fast performance (about ~2x slower than a simple colsum in my large matrix test).

I'll merge this in so it'll be available in the main branch

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/bnprks/BPCells/pull/83*issuecomment-2067240114__;Iw!!K-Hz7m0Vt54!h4OliP6h8YsQ9bAHxoGVhmg-TrzXvBVI0JoDwh4kVp4KBkYO9r8cfAYhBlvq9K4hk5lNgGTWP9JRrFkm5sI6$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACXSPQ646OZGO6HL4ZEUGZTY6F4YPAVCNFSM6AAAAABFYEJPAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGI2DAMJRGQ__;!!K-Hz7m0Vt54!h4OliP6h8YsQ9bAHxoGVhmg-TrzXvBVI0JoDwh4kVp4KBkYO9r8cfAYhBlvq9K4hk5lNgGTWP9JRrIVohnum$ . You are receiving this because you authored the thread.Message ID: @.***>