Open fredrik-johansson opened 8 months ago
I am working on the Conway polynomials. I think I recognized some patterns.
The second and third point will dominate the compression, so I think we will be able to compress it by 8x, while also speeding up the search for polynomials.
One can use https://github.com/harshvsingh8/so-size-analyzer, slightly modified to avoid listing duplicate symbols twice, to get a listing of large objects in libflint.so:
Here is a top 100:
Loop unrolling clearly adds a lot of bloat (the list for #1698 would look quite different).
For example,
acb_mat_mul_reorder
has no business being one of the largest functions in the library, and it's indeed only 10% the size with rolled loops. We can also reduce the size of this function by a factor 2 by declaring the static helper functions in that file as__attribute__((noinline))
(they certainly don't need to be inlined).The
butterfly_rshB
andbutterfly_lshB
functions in thefft
module also probably suffer from the combination of inlining and loop unrolling;mpn_sumdiff_n
infft.h
quite possible doesn't need to be inlined.With rolled loops, the largest entries are lookup tables. Some of these could be optimized, notably the largest table of all:
flint_conway_polynomials
. Just splitting this table into 8-bit, 16-bit and 32-bit tables should save more than a factor 2; with a smarter encoding, I guess it could be compressed 4x or 8x.