Large symbols in the binary

One can use https://github.com/harshvsingh8/so-size-analyzer, slightly modified to avoid listing duplicate symbols twice, to get a listing of large objects in libflint.so:

Here is a top 100:

Symbol sizes:
=============
 1.2MiB      flint_conway_polynomials
 125.8KiB    arb_hypgeom_gamma_tab_limbs
 49.0KiB     fmpz_lll_is_reduced_d_with_removal
 48.4KiB     fmpz_lll_is_reduced_d
 38.9KiB     acb_mat_mul_reorder
 36.0KiB     arb_sin_cos_tab22
 36.0KiB     fmpz_lll_is_reduced_mpfr_with_removal
 33.9KiB     fmpz_lll_is_reduced_mpfr
 32.5KiB     butterfly_rshB
 30.0KiB     butterfly_lshB
 29.5KiB     _fmpz_mpoly_divrem_array_chunked
 29.2KiB     arb_sin_cos_tab21
 29.1KiB     fmpz_lll_d_with_removal_knapsack
 27.7KiB     fmpz_lll_d_heuristic_with_removal
 27.7KiB     fmpz_lll_d_with_removal
 26.6KiB     _arf_add_mpn
 25.9KiB     fmpz_lll_check_babai_heuristic_d
 25.7KiB     fmpz_lll_d_heuristic
 25.7KiB     fmpz_lll_d
 25.5KiB     fmpz_lll_advance_check_babai_heuristic_d
 25.4KiB     arb_sin_cos_tab1
 25.4KiB     _fmpz_mpoly_divides_array_chunked
 24.9KiB     fmpz_lll_check_babai
 24.6KiB     fmpz_lll_advance_check_babai
 23.5KiB     _mpoly_compress_exps
 23.4KiB     fmpz_mpoly_factor_irred_zippel
 22.4KiB     arb_mat_mul_block
 22.3KiB     _nmod_mpoly_sqrt_heap
 22.2KiB     fmpz_lll_mpf2_with_removal
 21.9KiB     _fmpz_mod_mpoly_sqrt_heap
 21.9KiB     fq_nmod_mpoly_hlift_zippel
 21.8KiB     _fmpz_mpoly_sqrt_heap
 21.3KiB     n_bpoly_mod_lift_continue
 21.0KiB     _fmpz_mpoly_divrem_monagan_pearce
 20.8KiB     fmpz_mpolyl_gcd_zippel2
 20.7KiB     _nmod_mpoly_gcd_algo_small
 20.6KiB     _fmpz_mpoly_quasidivrem_heap
 20.4KiB     _fq_nmod_mpoly_sqrt_heap
 20.4KiB     fmpz_mat_hnf_pernet_stein
 20.3KiB     fmpz_mod_poly_factor_distinct_deg_threaded_with_frob
 20.0KiB     fmpz_lll_mpf2
 19.5KiB     _fmpz_mpoly_quasidiv_heap
 19.5KiB     nmod_poly_factor_distinct_deg_threaded
 19.5KiB     n_bpoly_mod_factor_lgprime
 18.9KiB     acb_theta_ql_a0_steps
 18.7KiB     n_fq_bpoly_factor_lgprime
 18.5KiB     _fmpz_mpoly_divrem_array_tight
 18.4KiB     fq_nmod_mpolyl_gcd_zippel_lgprime
 18.4KiB     fmpz_lll_check_babai_heuristic
 18.2KiB     nmod_mpolyl_gcd_zippel_lgprime
 18.2KiB     _fmpz_mpoly_divrem_ideal_monagan_pearce
 18.0KiB     arb_log_tab21
 18.0KiB     arb_log_tab22
 18.0KiB     arb_atan_tab21
 18.0KiB     arb_atan_tab22
 18.0KiB     arb_exp_tab22
 17.7KiB     _fmpz_mpoly_divides_array_tight
 17.5KiB     n_fq_bpoly_factor_smprime
 17.4KiB     _qadic_sqrt
 17.1KiB     _ca_poly_mullow
 16.9KiB     _fq_nmod_mpoly_divides_monagan_pearce
 16.6KiB     _fmpz_mpoly_div_monagan_pearce
 16.4KiB     _fmpz_mpoly_quasidivrem_ideal_heap
 16.4KiB     _nmod_mpolyn_mulsub_stripe.constprop.0
 16.3KiB     _acb_dirichlet_platt_multieval
 16.3KiB     _fmpz_mpoly_divides_monagan_pearce
 16.1KiB     fmpz_factor_pp1
 16.0KiB     arb_atan_tab1
 16.0KiB     _fq_nmod_mpoly_mulsub
 15.9KiB     acb_mat_eig_enclosure_rump
 15.8KiB     worker_loop
 15.7KiB     _fq_nmod_mpoly_divrem_ideal_monagan_pearce
 15.4KiB     _fmpz_mod_mpoly_divrem_ideal_monagan_pearce
 15.4KiB     _arb_mat_addmul_rad_mag_fast
 15.4KiB     fmpz_mod_mpolyl_gcd_zippel2_smprime
 14.8KiB     fexpr_builtin_table
 14.5KiB     _nmod_mpoly_divrem_ideal_monagan_pearce
 14.5KiB     fq_zech_mpoly_hlift_zippel
 14.4KiB     nmod_mpolyl_gcd_hensel_smprime
 14.4KiB     _fmpz_mod_mpoly_gcd_algo_small
 14.2KiB     fq_nmod_mpolyl_gcd_zippel_smprime
 14.0KiB     arith_stirling_number_1u
 14.0KiB     acb_modular_theta_sum
 13.9KiB     block_lanczos
 13.5KiB     _mpoly_test_irreducible
 13.2KiB     _fq_nmod_mpoly_from_univar
 13.2KiB     _fq_nmod_mpoly_quadratic_root_heap
 13.1KiB     _nmod_mpoly_divides_monagan_pearce
 13.0KiB     nmod_mpolyl_gcd_zippel_smprime
 13.0KiB     n_bpoly_mod_factor_smprime
 12.9KiB     arb_exp_tab21
 12.9KiB     _fq_zech_mpoly_divides_monagan_pearce
 12.8KiB     nmod_mpoly_hlift_zippel
 12.7KiB     _fmpz_mod_mpoly_divrem_monagan_pearce
 12.6KiB     arb_hypgeom_sum_fmpq_imag_arb_rs
 12.6KiB     acb_theta_ql_all
 12.6KiB     acb_theta_eld_set_rec
 12.5KiB     fq_zech_bpoly_factor_smprime
 12.4KiB     fmpq_mpoly_from_univar_bits

=================================
Total size of symbols: 15.1MiB
Total size of strings: 280.9KiB
Total size of constants: 544.1KiB
==================================
Filesize: 15.9MiB
==================================

Loop unrolling clearly adds a lot of bloat (the list for #1698 would look quite different).

For example, acb_mat_mul_reorder has no business being one of the largest functions in the library, and it's indeed only 10% the size with rolled loops. We can also reduce the size of this function by a factor 2 by declaring the static helper functions in that file as __attribute__((noinline)) (they certainly don't need to be inlined).

The butterfly_rshB and butterfly_lshB functions in the fft module also probably suffer from the combination of inlining and loop unrolling; mpn_sumdiff_n in fft.h quite possible doesn't need to be inlined.

With rolled loops, the largest entries are lookup tables. Some of these could be optimized, notably the largest table of all: flint_conway_polynomials. Just splitting this table into 8-bit, 16-bit and 32-bit tables should save more than a factor 2; with a smarter encoding, I guess it could be compressed 4x or 8x.

flintlib / flint

Large symbols in the binary #1704