JuliaMath / Primes.jl

Prime numbers in Julia
Other
99 stars 32 forks source link

make factor fast for small integers by memoizing #112

Closed jmichel7 closed 2 years ago

jmichel7 commented 2 years ago

I have several packages (CyclotomicNumbers, Combinat, Gapjm) where heavy use of factor is made -- mostly for other number-theoretic functions, like totient, divisors, prime_residues, primitive roots. The main application is various algorithms in finite group theory. I need to call many times factor for small integers (for my applications small is <300 but this can be discussed of course). I thus implemented caching:

const dict_factor=Dict(2=>Primes.factor(2))
factor(n::Integer)=if n<300 get!(()->Primes.factor(n),dict_factor,n) else Primes.factor(n) end

This way factor.(1:300) goes from 40μs to 3μs which is already better for my applications. But this redefinition causes problems when reexporting factor (conflict with Primes) so I wish this caching was in Primes.

The advantage of caching with a Dict is that only factorizations I need are stored (usually numbers with many factors which appears as orders of groups like 24, 120, etc...).

If just caching up to 300, it seems to me caching with a precomputed array is possible: this would just add 40μs to the precompilation time of Primes, and would bring the timing of factor.(1:300) to 500ns.

Anyway, I feel that Primes would be really more useful with such a caching, so this is my feature request.

oscardssmith commented 2 years ago

I don't think we should cache by default since different users will want different behavior (if you want caching, you can define cached_factor which won't conflict with Primes). That said, we could do a couple things to make factoring small numbers faster.

jmichel7 commented 2 years ago

The points are:

oscardssmith commented 2 years ago

the problem with caching in general is is someone calls factor on a billion numbers in a loop, they now have a few gigs of ram used that will never go away. there are also threading concerns (if you call factor from multiple threads, do they share a cache? if so, you have to deal with locks).

caching a limited set might be reasonable, but first I think it's worth trying to make factor have less overhead so the caching isn't necessary. I think we can probably get a free factor of 4 or so. also, for multiplicative functions, we could add fast paths for small numbers that dont call factor in the first place.

jmichel7 commented 2 years ago

I agree unlimited caching is a bad idea. I did not propose it. Now I am ignorant about multithreading issues, but I notice you have several large constin Primes.jl like const bases. If this is not a problem my proposal would neither be one since it would just add a similar const.

Anyway a speedup by a factor of 4 in general is welcome, even if insufficient for my needs. I am curious on how you plan to achieve that...

oscardssmith commented 2 years ago

the thing that makes multi threading hard is if you need to update from multiple threads. all of the caches primes uses iirc are not modified.

the speedup will be in a few places. the first is by making is prime check the list of small primes, (PR already made). most of the rest of the time in factor comes from modifying the dictionary, and allocating memory. I'm pretty sure that can be avoided since small numbers have few factors

jmichel7 commented 2 years ago

Your comment about update from multiple threads means that a Dict cache might have problems. But a precomputed array does not seem to pose a problem. It would be really nice to have factor blazing fast for small numbers...

oscardssmith commented 2 years ago

For the record, https://github.com/JuliaMath/Primes.jl/pull/114 and https://github.com/JuliaMath/Primes.jl/pull/115 already get a 50% speedup (40 μs to 28 on my computer for factor.(1:300);). of which 7μs is literally just allocating empty factorization objects (which is another reason you can't cache them that well, you need to return copies so the caches don't overwrite each other).

oscardssmith commented 2 years ago

with the eachfactor api in 0.5.3, we are now at 6us on my computer (4x faster than 0.5.2).