Sum up a function over primes

Bodigrim / arithmoi

Number theory: primes, arithmetic functions, modular computations, special sequences

http://hackage.haskell.org/package/arithmoi

MIT License

147 stars 40 forks source link

Sum up a function over primes #97

Open Bodigrim opened 6 years ago

Bodigrim commented 6 years ago

Math.NumberTheory.Primes.Counting.Impl.primeCount implements an algorithm to compute π(n), which is number of primes below n, in O(n^0.7) time and O(n^1/2) space. One can view π(n) as a sum of f p = 1 over prime p below n.

As it was pointed by @CarlEdman in #60 (primeLucy), this approach allows a generalization for any function f with the same time and space requirements. Moreover, there is a (difficult) way to improve its time complexity to O(n^2/3).

That said, it seems desirable to productionalize primeLucy, provide a nice API, write decent tests and benchmarks, and then express primeCount as a special case.

Additional resources: https://en.wikipedia.org/wiki/Meissel–Lehmer_algorithm http://sweet.ua.pt/tos/bib/5.4.pdf https://projecteuler.net/thread=10;page=5#111677

jhenahan commented 6 years ago

I've been messing around with this on a branch, and there are some really nice implementations that fall out of this approach.

primeCount falls out immediately as primeLucy (const 1)
sumOfPrimesUpTo is simply primeLucy id

It's still quite messy, but it's extremely fast once it has warmed up.

primeCount 100000 is slowish on my machine, every time I run it. primeCount' 100000 on my machine is slow the first time, but the result is cached and instantaneous for every n <= 100000 thereafter. And that shared allocation of primes carries over to every other function that uses primeLucy under the hood, so I can instantly get the sum of primes up to 100000.

I'll probably meditate on this and see if I can reimplement in terms of Vector (since that's another open task). I'll likely also rework the API so there's an option to provide your own summatory function for the case where summatory f isn't what you actually want, as in the original formulation.

CarlEdman commented 6 years ago

@jhenahan If we are talking about the primeLucy I originally submitted, I don't entirely understand.

My primeLucy not only allows you to supply, but requires, a summatory function. While calculating the summatory function from the principal function is possible, that would take O(n) and hence defeat the purpose.
As for switching from the Array interface to the Vector interface, far be it from me to defend the former's elegance. That said, the code I submitted was highly optimized using explicit offsets and the unsafe functions. That ought to compile down to close to optimal x86-64 assembly, so I doubt that using the nicer Vector interface will buy you much. But if you want to do it for elegance's sake and achieve the same (or better!) performance, please do!

jhenahan commented 6 years ago

@CarlEdman Thanks very much for the fairly critical point about the summatory function. :) That would absolutely defeat the purpose. I'll have to think on it more, in that case.

CarlEdman commented 6 years ago

@jhenahan My pleasure. Let me know if I can help.

Bodigrim commented 6 years ago

With regards to array vs. vector: there are unboxed mutable vectors with a low-level interface, providing decent performance: unsafeRead, unsafeWrite, etc. I feel like nowadays Haskell developers (or at least myself :) are more acquainted with Vector and it is a default choice, but in course of this issue we should try to benchmark both data structures.

CarlEdman commented 6 years ago

@bodigrim I think that unboxing could be a big win. For starters, it ought to reduce GC time to just about 0. The reason I didn't use unboxed arrays was that for my problem I needed more than 64-bit int precision, so I had to go with Integers which can't really be unboxed.

Given how large some of the summatory function results can be, I think keeping Integers available for primeLucy would be useful. But an alternative version that uses unboxed ints (perhaps with modular arithmetic support at every stage) could be a big win for certain users.

jhenahan commented 6 years ago

Would you have any suggested literature on “useful” summatory functions? While I’m thinking on it, it couldn’t hurt to build in more special cases like primeCount.

CarlEdman commented 6 years ago

@jhenahan

I don't really have a literature, but a few pointers:

The Faulhaber polynomials (for which I submitted code at Issue #70) give the summatory functions for all series of the form $x^n$. Hence, by trivial extension, they also give the summatory functions for any polynomial.
I have not tried this, but I believe primeLucy can be used to generate summatory functions which can be fed back to itself. This should allow you to efficiently calculate, e.g., the sum of all composite numbers up to n which are the product of exactly two primes.

Bodigrim commented 6 years ago

@jhenahan any news? Feel free to ping me, if you need to discuss anything.

jhenahan commented 6 years ago

Nothing yet, but I’ll be setting aside some time to look into it further next week. Work’s been eating my OSS time.