GlenKPeterson / Paguro

Generic, Null-safe, Immutable Collections and Functional Transformations for the JVM
Other
312 stars 24 forks source link

More Benchmarks vs. pCollections (was: incorrect comparison with pCollections) #23

Open strangepleasures opened 6 years ago

strangepleasures commented 6 years ago

In the wiki you explain the difference between the two implementation comparing O(log2 n) and O(log32 n). I'm afraid, you don't understand the idea behind the Big O notation. By definition, it ignores any constant multiplier, so that O(kf(n)) = O(f(n)) for any k <> 0. See "Multiplication by a constant" in the Wikipedia article. From elementary math, log(2, n) = 5 log(32, n) :

log(32, n)= log(2^5, n) = log(2, n) / log(2, 32) = log(2, n) / 5

It means that O(log(2, n)) = O(log(32, n)) the same way as O(3 * n) = O(n)

This means that both UncleJim and PCollections belong to the same big O class, which mathematicians denote simply as O(log(n)).

Of course the constant factor is important for real-world use but it depends on all the implementation details, not just one array length constant. Also, the big O notation describes the asymptotic behaviour of an algorithm when n tends towards infinity and might not reflect its behaviour with smaller values.

GlenKPeterson commented 6 years ago

Thank you for noticing and taking the time to report this. I think I have fixed the mistakes. If you agree, I'll close this ticket. Here's the page I think you reported on: https://github.com/GlenKPeterson/Paguro/wiki/UncleJim-vs.-PCollections

strangepleasures commented 6 years ago

The fact that Clojure uses trees with 32 elements of course doesn't mean that it's five times faster. If that were the case it would make sense to use even more elements, let's say 64 :) The number of children per node actually determines the trade-off between read and write speed. When you use more child elements per node then you need less layers. For example, to store 1024 elements you need 10 layers with 2 elements per node and only 2 layers with 32 elements per node. Going from one layer to another might cause switching of a different memory page which is expensive. When you read you go through the layers, so the smaller the number of layers is, the faster you get the value. On the other hand, having too many children per node slows down writing , because you need to make a copy of every children's array you modify. Also, trees with too many children per node consume too much memory for small data sets. Probably, 32 is a reasonable compromise between read speed on one side and write speed and memory consumption on another side, but depending on the task it also could be 16 or 128. You can even make it a parameter and let the user to decide if she wants fast read access or fast modification and small memory footprint. The actual speed of a software library of course depends not only on algorithm parameters but also on all kinds of implementation details. I'm really curious to see a real benchmark, comparing read and write speed and memory footprint of UncleJim vs Clojure vs PCollections for datasets of different sizes.

GlenKPeterson commented 6 years ago

Everything in your comment is correct and insightful. You didn't mention the processor cache, yet, but I'm guessing you're saving that for the next round.

I guess I was shooting for something simple and high-level that amateurs could understand, without getting too bogged down in details. I wasn't so worried about accuracy and providing a general idea of why one is likely to be faster than the other. On the other hand, I really like what you wrote here and think it would be a good addition to the Wiki for this project. Not sure it belongs on this specific page, but a link to it probably does!

Do you have write access to these wiki pages? What's your level of interest in helping out? Are you comfortable with Kotlin and/or Java? I see one Kotlin project and 2 Java projects of yours on GitHub.

Since I wrote that page, I added some benchmarks where I test things against the Scala collections: https://github.com/GlenKPeterson/Paguro/tree/master/paguro-bench

That could easily be expanded to test against PCollections. If you want to try that, and you make as few changes as possible, as neatly as possible, I'll probably merge your changes.

strangepleasures commented 6 years ago

The benchmarks look very good and stand for themselves. Thanks for inviting to contribute to the project, I'll try to find time to do that.

GlenKPeterson commented 6 years ago

The only thing I want to add is that 32 element arrays seem to fit into the read-ahead cache on modern desktops and servers. So there is one memory access for each level in the tree. One to find the level, and one to get the correct element from that array. The selected element is dereferenced and the array from the next level down is read into memory. So, there's that.

I'm going to leave this ticket open and rename it, "More Benchmarks vs. pCollections" because that's something that should be done. I think it's behind the Kotlin project and possibly integrating Paguro into Kategory.