flintlib / flint

FLINT (Fast Library for Number Theory)
http://www.flintlib.org
GNU Lesser General Public License v3.0
422 stars 239 forks source link

Decrease building time #1221

Open albinahlback opened 1 year ago

albinahlback commented 1 year ago

It's time for me to try to discuss decreasing building time once again.

With MegaFLINT incoming, building time is going to be increased, as Fredrik mentioned in #1218. Not everyone has an AMD Epyc with three million cores that is able to heat five households during a cold winter day. For me, building FLINT single threaded takes 10 minutes on my laptop. As I have mentioned here, I have been able to reduce FLINT's compile time by over 90%, and I don't think such a goal to reduce compile time is unfeasible.

For me, a major obstacle for helping to develop FLINT is actually the building time. Now, I know that I can compile with --disable-static or whatever, and that once FLINT has been built it often times does not have to built everything again. But what if I accidentally save flint.h or simply switch branch? Should such things really take minutes of my time?

As my previous attempts have been unsuccessful, I propose one very conservative way to reduce the building time: merge source files. The reason for this is that the compilation time is actually almost linear with respect to the amount of files being compiled, at least when it comes to Clang and GCC.

I propose dividing the source files into categories like

fmpz/basic_operations.c /* fmpz_add, fmpz_mul_ui, fmpz_and, fmpz_pow, fmpz_div, etc. */
fmpz/combinatorial_functions.c
fmpz/number_theoretic_functions.c

or something similar. Preferably, the categories would be few (in order to reduce the compile time as much as possible) and translatable into almost every type (like number_theoretic_functions.c could be too specific as it is not as useful for fmpz_mat, for example).

@fredrik-johansson, you have previously mentioned (I don't recall when or where) that it could be nice to merge functions like fmpz_add_ui and fmpz_add to one source file. What do you think about this, and do you have any suggestions?

fredrik-johansson commented 1 year ago

The build time is indeed a problem. I would have found working on Flint unbearable without the 8-core laptop I got last year.

Some source files could be merged, but I would proceed very conservatively.

For a start, I'd certainly be comfortable with merging files for trivial type versions of the same function. For example, fmpq/add.c, fmpq/add_ui.c, fmpq/add_si.c, fmpq/add_fmpz.c, fmpq/add_small.c could all be merged into fmpq/add.c. This already happens in some modules; e.g. https://github.com/flintlib/flint2/blob/trunk/qqbar/add.c.

The templated functions in the fq directories arguably don't need to be in separate source files. For example, in fq_poly, instead of add.c, add_series.c, ... there could be a single templated.c or generic.c that includes all the templated algorithms:

#define T fq
#define CAP_T FQ
#include "fq_poly_templates/add.c"
#include "fq_poly_templates/add_series.c"
...

However, a potential downside is that the module can no longer be built in parallel, so it's possible that parallel builds slow down.

My intention is that gr will allow replacing much more code in Flint with trivial wrappers around generic implementations. See for example https://github.com/flintlib/flint2/blob/trunk/ca_mat/solve_tril.c. Then those trivial wrappers could be merged into a single compilation unit. Rolling this out through Flint will necessary be a gradual process though, to make sure there are no regressions.

BTW, here is the current CPU time to build each module:

47.6 fmpz_mpoly
46.5 nmod_mpoly
37.5 fq_nmod_mpoly
32.7 arb
28.9 fmpz_poly
27.4 ca
26.0 acb_poly
24.5 acb_hypgeom
22.9 fmpz_mod_mpoly
22.5 arb_poly
21.9 qqbar
21.8 fmpz_mat
21.3 fmpz_lll
20.9 nmod_poly
20.7 ca_mat
19.2 acb_dirichlet
18.8 fmpz_mod_poly
18.5 fq_poly
18.4 fq_nmod_poly
18.0 fmpz_mod_mpoly_factor
17.1 fq_zech_poly
16.5 acb_mat
16.4 gr
16.3 arb_hypgeom
14.9 arb_mat
14.8 gr_mat
14.7 acb
14.6 fmpq_poly
14.5 nmod_mpoly_factor
14.1 ca_poly
13.9 fmpz_mpoly_factor
13.7 fmpz
13.3 fmpq_mpoly
13.2 gr_poly
11.8 mpoly
11.3 fq_nmod_mpoly_factor
11.2 n_poly
 9.9 fq_zech_mpoly
 9.6 fexpr
 8.1 arf
 7.4 fq_zech_mpoly_factor
 7.2 fq_mat
 6.8 fq_nmod_mat
 6.8 fmpq_mat
 6.8 nmod_mat
 6.8 fq_zech_mat
 6.5 aprcl
 6.3 nf_elem
 6.2 fft
 5.7 acb_modular
 5.6 fmpz_mpoly_q
 5.5 gr_mpoly
 5.3 fmpq
 5.3 ulong_extras
 5.2 fmpz_poly_mat
 5.2 qadic
 5.1 fmpz_mod_mat
 5.1 arith
 4.9 fq
 4.7 fmpr
 4.2 mag
 4.2 padic_poly
 4.0 fmpz_mod_poly_factor
 3.9 fmpz_vec
 3.7 fmpz_factor
 3.7 nmod_poly_mat
 3.4 fq_zech_poly_factor
 3.4 acb_elliptic
 3.2 fq_poly_factor
 3.2 fq_nmod
 3.1 fq_nmod_poly_factor
 2.8 padic
 2.8 utils_flint
 2.7 nmod_poly_factor
 2.7 fq_zech
 2.6 fmpz_poly_q
 2.6 bool_mat
 2.5 ca_field
 2.4 padic_mat
 2.3 fmpz_poly_factor
 2.0 dlog
 1.8 qsieve
 1.8 acb_dft
 1.7 mpn_extras
 1.6 fq_vec
 1.6 dirichlet
 1.5 fmpq_mpoly_factor
 1.5 gr_special
 1.5 ca_ext
 1.5 fmpz_mod
 1.5 qfb
 1.3 arb_fmpz_poly
 1.2 fmpzi
 1.1 fq_nmod_vec
 1.1 fq_zech_vec
 1.0 bernoulli
 1.0 nmod_vec
 0.8 fq_default
 0.8 d_mat
 0.7 gr_vec
 0.7 mpf_mat
 0.7 acb_calc
 0.6 mpf_vec
 0.6 d_vec
 0.6 arb_calc
 0.5 acf
 0.5 arb_fpwrap
 0.5 fq_default_poly
 0.4 thread_pool
 0.4 fmpz_mod_vec
 0.3 double_interval
 0.3 hypgeom
 0.3 partitions
 0.3 fq_default_mat
 0.3 fmpq_vec
 0.3 mpfr_vec
 0.3 fq_embed
 0.3 ca_vec
 0.3 calcium
 0.3 fq_nmod_embed
 0.2 fexpr_builtin
 0.2 fq_zech_embed
 0.2 nf
 0.2 mpfr_mat
 0.2 double_extras
 0.1 long_extras
 0.1 fq_default_poly_factor
 0.1 perm
 0.1 thread_support
 0.1 fmpz_extras
 0.0 nmod

Probably this correlates quite well with the amount of code in each module, but it would be worth investigating if there are some modules that take more time to build than they ought to.

albinahlback commented 1 year ago

I will start with this after my PR has been merged. Will make sure to start with those that are most build-time heavy.

albinahlback commented 1 year ago

Just like #1232, one can do the merge template tests. However, then one needs to alter the templates so that they can print which function failed so one is not left to guess which of the 50 functions failed.

albinahlback commented 1 year ago

@fredrik-johansson Do you think that we can be more aggressive when it comes to merging test files, compared to source files?

fredrik-johansson commented 1 year ago

I want to leave the test code alone for now.

albinahlback commented 1 year ago

I checked individual building times. X_mpoly and X_mpoly_factor has the worst build time per file. Even the simplest one have longer build times than basically every file in, say, ulong_extras or nmod_poly.

But I think what could gain most time at the moment is merging files.

fredrik-johansson commented 1 year ago

I wonder if this has to do with the number of inlined functions in mpoly.h and X_mpoly.h. There's also an unnecessary inclusion of string.h. But there's maybe something else going on too.

albinahlback commented 1 year ago

I bet that's the case.

albinahlback commented 1 year ago

So these are the average build time per file:

0.0513 : acb
0.0552 : acb_calc
0.0509 : acb_dft
0.0598 : acb_dirichlet
0.0563 : acb_elliptic
0.0587 : acb_hypgeom
0.0520 : acb_mat
0.0447 : acb_modular
0.0528 : acb_poly
0.0540 : acf
0.0378 : aprcl
0.0480 : arb
0.0413 : arb_calc
0.0467 : arb_fmpz_poly
0.2290 : arb_fpwrap
0.0565 : arb_hypgeom
0.0440 : arb_mat
0.0463 : arb_poly
0.0427 : arf
0.0343 : arith
0.0467 : bernoulli
0.0234 : bool_mat
0.0772 : ca
0.0754 : ca_ext
0.0815 : ca_field
0.0783 : ca_mat
0.0774 : ca_poly
0.0835 : ca_vec
0.0318 : calcium
0.0218 : d_mat
0.0207 : d_vec
0.0242 : dirichlet
0.0234 : dlog
0.0273 : double_extras
0.0364 : double_interval
0.0395 : fexpr
0.0303 : fexpr_builtin
0.0278 : fft
0.0334 : fmpq
0.0343 : fmpq_mat
0.0466 : fmpq_mpoly
0.0522 : fmpq_mpoly_factor
0.0381 : fmpq_poly
0.0297 : fmpq_vec
0.0305 : fmpz
0.0280 : fmpz_extras
0.0326 : fmpz_factor
0.0508 : fmpz_lll
0.0382 : fmpz_mat
0.0298 : fmpz_mod
0.0366 : fmpz_mod_mat
0.0457 : fmpz_mod_mpoly
0.0760 : fmpz_mod_mpoly_factor
0.0434 : fmpz_mod_poly
0.0443 : fmpz_mod_poly_factor
0.0271 : fmpz_mod_vec
0.0528 : fmpz_mpoly
0.0953 : fmpz_mpoly_factor
0.0469 : fmpz_mpoly_q
0.0363 : fmpz_poly
0.0363 : fmpz_poly_factor
0.0335 : fmpz_poly_mat
0.0334 : fmpz_poly_q
0.0271 : fmpz_vec
0.0343 : fmpzi
0.0567 : fq
0.0646 : fq_default
0.1180 : fq_default_mat
0.1360 : fq_default_poly
0.1160 : fq_default_poly_factor
0.0573 : fq_embed
0.0535 : fq_mat
0.0511 : fq_nmod
0.0380 : fq_nmod_embed
0.0565 : fq_nmod_mat
0.0845 : fq_nmod_mpoly
0.0902 : fq_nmod_mpoly_factor
0.0598 : fq_nmod_poly
0.0520 : fq_nmod_poly_factor
0.0320 : fq_nmod_vec
0.0620 : fq_poly
0.0490 : fq_poly_factor
0.0400 : fq_vec
0.0515 : fq_zech
0.0360 : fq_zech_embed
0.0540 : fq_zech_mat
0.0894 : fq_zech_mpoly
0.0990 : fq_zech_mpoly_factor
0.0590 : fq_zech_poly
0.0520 : fq_zech_poly_factor
0.0300 : fq_zech_vec
0.0256 : generic_files
0.1125 : gr
0.0331 : gr_mat
0.0486 : gr_mpoly
0.0375 : gr_poly
0.0684 : gr_special
0.0348 : gr_vec
0.0458 : hypgeom
0.0220 : long_extras
0.0356 : mag
0.0215 : mpf_mat
0.0198 : mpf_vec
0.0215 : mpfr_mat
0.0210 : mpfr_vec
0.0268 : mpn_extras
0.0407 : mpoly
0.0759 : n_poly
0.0348 : nf
0.0416 : nf_elem
0.0300 : nmod
0.0290 : nmod_mat
0.0841 : nmod_mpoly
0.0879 : nmod_mpoly_factor
0.0349 : nmod_poly
0.0337 : nmod_poly_factor
0.0298 : nmod_poly_mat
0.0259 : nmod_vec
0.0324 : padic
0.0374 : padic_mat
0.0351 : padic_poly
0.0485 : partitions
0.0223 : perm
0.0634 : qadic
0.0340 : qfb
0.0526 : qqbar
0.0396 : qsieve
0.0232 : thread_pool
0.0305 : thread_support
0.0238 : ulong_extras
0.0435 : utils_flint
albinahlback commented 1 year ago

fmpz_mpoly takes the most time to compile, so I tried merging all of these files and that dropped the compilation time by 80%. Not sure yet if we want to do that, but it's an option.

Edit: I meant merging like #include "myfile.c", not actually concatenating all of the files.

edgarcosta commented 11 months ago

For the record:

( ./bootstrap.sh && ./configure && make clean && make -j1; )  1129.23s user 306.90s system 95% cpu 24:57.31 total

on Intel i7-9750H (12) @ 2.60GHz

edgarcosta commented 9 months ago

Given #1641 , I would like to raise the question: For whom are we trying to reduce the compilation time? everyone? developers? one-time users?

albinahlback commented 9 months ago

Given #1641 , I would like to raise the question: For whom are we trying to reduce the compilation time? everyone? developers? one-time users?

Developers, mainly, and the amount of time spent in the CI.

edgarcosta commented 9 months ago

If it is for developers, it is unclear to me that --enable-merged-compilation should be the default, as we only want to rebuild the object files that we have touched instead of merging and rebuilding the whole flint. Second, the existence of the feature adds one more thing that I need to worry about when contributing, as now I need to worry about whether my edits will clash with some of the other files in the same module.

On the other hand, I support emerging locally and constantly, e.g.,

For a start, I'd certainly be comfortable with merging files for trivial type versions of the same function. For example, fmpq/add.c, fmpq/add_ui.c, fmpq/add_si.c, fmpq/add_fmpz.c, fmpq/add_small.c could all be merged into fmpq/add.c.

fredrik-johansson commented 9 months ago

I agree that worrying about nonlocal collisions is going to be too annoying, and the speedup is a bit disappointing (only 25% on my machine).

albinahlback commented 9 months ago

Yeah, I agree...

I will probably just merge some files.

fredrik-johansson commented 9 months ago

Nevertheless, thanks for having done the experiment!