dimforge / nalgebra

Linear algebra library for Rust.
https://nalgebra.org
Apache License 2.0
4.02k stars 481 forks source link

nalgebra 0.13: pure-rust matrix factorizations, LAPACK integration, and performance improvements #274

Closed sebcrozet closed 7 years ago

sebcrozet commented 7 years ago

Please, I would prefer that nobody make announcements on blogs about this (in reddit, hackernews, etc.) before the release on crates.io which will occur during the second half of august.

All the following are currently implemented on the linalg branch of nalgebra.

Since the last large release #218 of nalgebra, I've been working slowly but thoroughly on a few features that makes a step toward making nalgebra more useful for general-purpose linear algebra. This includes three major features:

  1. Pure-rust implementations of some the most common matrix factorization.
  2. Partial integration of Lapack. This is based on previous works on the nalgebra-lapack crate (made by @astraw).
  3. Insertion and removal of rows and columns to matrices.

Of course, all those work on statically-sized matrices as well as dynamically-sized matrices! I alsow want to note that none of this would have been possible without the amazing typenum crate which allows addition/subtraction/minimum of type-level integers.

Matrix factorizations

This new version of nalgebra will include pure-rust implementations of the following factorization for real matrices ("general matrices" designates real-valued matrices that may be rectangular):

All those are implemented in the linalg folder. Note that as far as I know, the only other crate that implements those factorization in pure-Rust is rulinalg written by @AtheMathmo and @Andlon. Refer to the next post for benchmarks showing the performances of nalgebra compared to rulinalg.

Lapack integration

I'd like to move the nalgebra-lapack crate to the main nalgebra github repository to simplify the task of keeping them in sync. The next nalgebra-lapack version has been updated and I intend to make it contain at least all the decompositions implemented in Rust and expose similar interfaces to the user. It is not finished. Currently, it contains: Cholesky, Hessenberg, LU, QR, Eigendecomposition of real matrices, SVD.

Matrix edition

This adds the ability to resize a matrix or insert/remove rows and columns. For dynamically-sized matrices (allocated on the heap), the pre-existing buffer is reallocated (i.e. the underlying Vec is shrink or expanded.)

The ability to fill different parts of the matrices are added as well:

Patreon

As a side note, I discovered Patreon only recently so I set up a page for my projects: https://www.patreon.com/sebcrozet Any financial aid is greatly appreciated.

Release date

The release is planed for the second half of august. Indeed, I still have to write documentations (including a new page on the nagebra users guide) and refine the API. Any feedback before the release would be very useful to improve the quality of the actual release.

sebcrozet commented 7 years ago

Benchmarks

I'm pinging all the concerned library authors, in case they disagree with the way I used their libraries: @termoshtt, @Andlon, @brendanzab, @masonium, @AtheMathmo .

I've made some rudimentary benchmarks that you may execute yourself by cloning the rust_linalg_bench repository and running cargo bench. This benchmarks compares nalgebra, rulinalg, nalgebra-lapack, linxal, ndarray_linalg.

Note that:

Here are the benchmark results with an Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (for each test, the times are sorted from the fastest to the slowest). If I used the other libraries correctly, note that nalgebra often outperforms rulinalg. nalgebra-lapack also outperforms linxal and ndarray_linarg (even if they all use the same Lapack backend).

test linalg::cholesky::cholesky_100x100_na                    ... bench:      79,099 ns/iter (+/- 495)
test linalg::cholesky::cholesky_100x100_na_lapack             ... bench:      83,832 ns/iter (+/- 118,635)
test linalg::cholesky::cholesky_100x100_rulinalg              ... bench:      91,186 ns/iter (+/- 1,326)

test linalg::cholesky::cholesky_200x200_na_lapack             ... bench:     204,466 ns/iter (+/- 80,880)
test linalg::cholesky::cholesky_200x200_na                    ... bench:     560,173 ns/iter (+/- 2,410)
test linalg::cholesky::cholesky_200x200_rulinalg              ... bench:     601,727 ns/iter (+/- 2,247)

test linalg::cholesky::cholesky_500x500_na_lapack             ... bench:   1,196,005 ns/iter (+/- 493,191)
test linalg::cholesky::cholesky_500x500_rulinalg              ... bench:   8,437,363 ns/iter (+/- 245,283)
test linalg::cholesky::cholesky_500x500_na                    ... bench:   8,462,242 ns/iter (+/- 136,445)

test linalg::cholesky::cholesky_unpack_100x100_na_lapack      ... bench:      76,778 ns/iter (+/- 22,059)
test linalg::cholesky::cholesky_unpack_100x100_na             ... bench:      80,384 ns/iter (+/- 265)
test linalg::cholesky::cholesky_unpack_100x100_rulinalg       ... bench:      96,154 ns/iter (+/- 244)
test linalg::cholesky::cholesky_unpack_100x100_ndarray_linalg ... bench:     114,380 ns/iter (+/- 16,933)
test linalg::cholesky::cholesky_unpack_100x100_linxal         ... bench:     120,257 ns/iter (+/- 39,094)

test linalg::cholesky::cholesky_unpack_200x200_na_lapack      ... bench:     219,788 ns/iter (+/- 37,601)
test linalg::cholesky::cholesky_unpack_200x200_ndarray_linalg ... bench:     348,092 ns/iter (+/- 60,603)
test linalg::cholesky::cholesky_unpack_200x200_linxal         ... bench:     377,128 ns/iter (+/- 243,553)
test linalg::cholesky::cholesky_unpack_200x200_na             ... bench:     568,680 ns/iter (+/- 2,028)
test linalg::cholesky::cholesky_unpack_200x200_rulinalg       ... bench:     619,849 ns/iter (+/- 1,597)

test linalg::cholesky::cholesky_unpack_500x500_na_lapack      ... bench:   1,046,094 ns/iter (+/- 319,187)
test linalg::cholesky::cholesky_unpack_500x500_ndarray_linalg ... bench:   2,741,188 ns/iter (+/- 708,088)
test linalg::cholesky::cholesky_unpack_500x500_linxal         ... bench:   2,844,677 ns/iter (+/- 1,445,580)
test linalg::cholesky::cholesky_unpack_500x500_rulinalg       ... bench:   8,517,809 ns/iter (+/- 105,862)
test linalg::cholesky::cholesky_unpack_500x500_na             ... bench:   8,543,781 ns/iter (+/- 176,950)

test linalg::eigen::eigenvalues_100x100_na                    ... bench:   2,293,167 ns/iter (+/- 32,866)
test linalg::eigen::eigenvalues_100x100_linxal                ... bench:  13,858,785 ns/iter (+/- 2,560,663)
test linalg::eigen::eigenvalues_100x100_na_lapack             ... bench:  13,909,676 ns/iter (+/- 612,533)
test linalg::eigen::eigenvalues_100x100_rulinalg              ... bench:  14,989,440 ns/iter (+/- 225,927)

test linalg::eigen::eigenvalues_200x200_na                    ... bench:  16,576,855 ns/iter (+/- 752,285)
test linalg::eigen::eigenvalues_200x200_linxal                ... bench:  66,726,148 ns/iter (+/- 6,423,217)
test linalg::eigen::eigenvalues_200x200_na_lapack             ... bench:  72,630,321 ns/iter (+/- 42,434,494)
test linalg::eigen::eigenvalues_200x200_rulinalg              ... bench: 101,847,822 ns/iter (+/- 13,394,967)

test linalg::full_piv_lu::full_piv_lu_100x100_na              ... bench:     404,728 ns/iter (+/- 1,245)
test linalg::full_piv_lu::full_piv_lu_100x100_rulinalg        ... bench:   1,297,164 ns/iter (+/- 14,197)

test linalg::full_piv_lu::full_piv_lu_200x200_na              ... bench:   3,091,332 ns/iter (+/- 20,949)
test linalg::full_piv_lu::full_piv_lu_200x200_rulinalg        ... bench:  10,114,018 ns/iter (+/- 42,487)

test linalg::full_piv_lu::full_piv_lu_500x500_na              ... bench:  48,830,407 ns/iter (+/- 1,022,364)
test linalg::full_piv_lu::full_piv_lu_500x500_rulinalg        ... bench: 155,995,215 ns/iter (+/- 985,370)

test linalg::full_piv_lu::full_piv_lu_unpack_100x100_na       ... bench:     444,768 ns/iter (+/- 1,140)
test linalg::full_piv_lu::full_piv_lu_unpack_100x100_rulinalg ... bench:   1,317,153 ns/iter (+/- 1,960)

test linalg::full_piv_lu::full_piv_lu_unpack_200x200_na       ... bench:   3,264,664 ns/iter (+/- 47,316)
test linalg::full_piv_lu::full_piv_lu_unpack_200x200_rulinalg ... bench:  10,185,565 ns/iter (+/- 10,582)

test linalg::full_piv_lu::full_piv_lu_unpack_500x500_na       ... bench:  49,871,308 ns/iter (+/- 103,454)
test linalg::full_piv_lu::full_piv_lu_unpack_500x500_rulinalg ... bench: 156,537,253 ns/iter (+/- 85,403)

test linalg::hessenberg::hessenberg_100x100_na                ... bench:     507,007 ns/iter (+/- 4,062)
test linalg::hessenberg::hessenberg_100x100_rulinalg          ... bench:     603,296 ns/iter (+/- 4,007)
test linalg::hessenberg::hessenberg_100x100_na_lapack         ... bench:     797,663 ns/iter (+/- 70,759)

test linalg::hessenberg::hessenberg_200x200_na_lapack         ... bench:   2,882,478 ns/iter (+/- 561,176)
test linalg::hessenberg::hessenberg_200x200_na                ... bench:   4,054,084 ns/iter (+/- 18,782)
test linalg::hessenberg::hessenberg_200x200_rulinalg          ... bench:   4,798,409 ns/iter (+/- 14,204)

test linalg::hessenberg::hessenberg_500x500_na_lapack         ... bench:  21,193,349 ns/iter (+/- 3,847,976)
test linalg::hessenberg::hessenberg_500x500_na                ... bench:  70,689,970 ns/iter (+/- 175,440)
test linalg::hessenberg::hessenberg_500x500_rulinalg          ... bench:  79,794,882 ns/iter (+/- 1,288,317)

test linalg::hessenberg::hessenberg_unpack_100x100_na         ... bench:     774,599 ns/iter (+/- 25,331)
test linalg::hessenberg::hessenberg_unpack_100x100_rulinalg   ... bench:     853,903 ns/iter (+/- 46,551)
test linalg::hessenberg::hessenberg_unpack_100x100_na_lapack  ... bench:   1,265,239 ns/iter (+/- 5,312,497)

test linalg::hessenberg::hessenberg_unpack_200x200_na_lapack  ... bench:   4,494,103 ns/iter (+/- 892,203)
test linalg::hessenberg::hessenberg_unpack_200x200_na         ... bench:   5,872,365 ns/iter (+/- 446,610)
test linalg::hessenberg::hessenberg_unpack_200x200_rulinalg   ... bench:   6,769,781 ns/iter (+/- 459,629)

test linalg::hessenberg::hessenberg_unpack_500x500_na_lapack  ... bench:  33,775,048 ns/iter (+/- 3,920,029)
test linalg::hessenberg::hessenberg_unpack_500x500_na         ... bench:  99,048,931 ns/iter (+/- 4,753,687)
test linalg::hessenberg::hessenberg_unpack_500x500_rulinalg   ... bench: 111,293,250 ns/iter (+/- 230,018)

test linalg::inverse::inverse_100x100_na_lapack               ... bench:     311,685 ns/iter (+/- 8,199)
test linalg::inverse::inverse_100x100_linxal                  ... bench:     361,804 ns/iter (+/- 30,907)
test linalg::inverse::inverse_100x100_ndarray_linalg          ... bench:     362,779 ns/iter (+/- 12,394)
test linalg::inverse::inverse_100x100_na                      ... bench:     564,350 ns/iter (+/- 5,234)
test linalg::inverse::inverse_100x100_rulinalg                ... bench:   1,324,270 ns/iter (+/- 8,525)

test linalg::inverse::inverse_200x200_na_lapack               ... bench:     974,133 ns/iter (+/- 89,083)
test linalg::inverse::inverse_200x200_ndarray_linalg          ... bench:   1,173,222 ns/iter (+/- 23,697)
test linalg::inverse::inverse_200x200_linxal                  ... bench:   1,174,290 ns/iter (+/- 46,108)
test linalg::inverse::inverse_200x200_na                      ... bench:   4,022,274 ns/iter (+/- 62,137)
test linalg::inverse::inverse_200x200_rulinalg                ... bench:   9,704,370 ns/iter (+/- 15,516)

test linalg::inverse::inverse_500x500_na_lapack               ... bench:   5,934,691 ns/iter (+/- 586,549)
test linalg::inverse::inverse_500x500_ndarray_linalg          ... bench:   7,577,193 ns/iter (+/- 250,631)
test linalg::inverse::inverse_500x500_linxal                  ... bench:   7,624,185 ns/iter (+/- 1,589,114)
test linalg::inverse::inverse_500x500_na                      ... bench:  68,263,530 ns/iter (+/- 145,827)
test linalg::inverse::inverse_500x500_rulinalg                ... bench: 140,537,347 ns/iter (+/- 674,506

test linalg::lu::lu_100x100_na_lapack                         ... bench:     125,162 ns/iter (+/- 17,159)
test linalg::lu::lu_100x100_na                                ... bench:     129,797 ns/iter (+/- 420)
test linalg::lu::lu_100x100_linxal                            ... bench:     149,414 ns/iter (+/- 22,454)
test linalg::lu::lu_100x100_ndarray_linalg                    ... bench:     152,160 ns/iter (+/- 32,222)
test linalg::lu::lu_100x100_rulinalg                          ... bench:     521,223 ns/iter (+/- 2,757)

test linalg::lu::lu_200x200_na_lapack                         ... bench:     391,742 ns/iter (+/- 311,770)
test linalg::lu::lu_200x200_linxal                            ... bench:     488,943 ns/iter (+/- 147,941)
test linalg::lu::lu_200x200_ndarray_linalg                    ... bench:     489,760 ns/iter (+/- 78,566)
test linalg::lu::lu_200x200_na                                ... bench:   1,008,323 ns/iter (+/- 3,353)
test linalg::lu::lu_200x200_rulinalg                          ... bench:   3,958,843 ns/iter (+/- 6,393)

test linalg::lu::lu_500x500_na_lapack                         ... bench:   2,097,076 ns/iter (+/- 807,585)
test linalg::lu::lu_500x500_linxal                            ... bench:   2,974,908 ns/iter (+/- 919,297)
test linalg::lu::lu_500x500_ndarray_linalg                    ... bench:   2,982,535 ns/iter (+/- 881,760)
test linalg::lu::lu_500x500_na                                ... bench:  18,008,954 ns/iter (+/- 86,650)
test linalg::lu::lu_500x500_rulinalg                          ... bench:  60,193,484 ns/iter (+/- 70,956)

test linalg::lu::lu_unpack_100x100_na                         ... bench:     152,349 ns/iter (+/- 436)
test linalg::lu::lu_unpack_100x100_rulinalg                   ... bench:     543,412 ns/iter (+/- 4,713)

test linalg::lu::lu_unpack_200x200_na                         ... bench:   1,138,849 ns/iter (+/- 2,019)
test linalg::lu::lu_unpack_200x200_rulinalg                   ... bench:   4,045,335 ns/iter (+/- 7,137)

test linalg::lu::lu_unpack_500x500_na                         ... bench:  18,700,299 ns/iter (+/- 43,242)
test linalg::lu::lu_unpack_500x500_rulinalg                   ... bench:  60,805,150 ns/iter (+/- 87,084)

test linalg::lu_solve::lu_solve_100x100_na                    ... bench:       3,924 ns/iter (+/- 67)
test linalg::lu_solve::lu_solve_100x100_na_lapack             ... bench:       3,928 ns/iter (+/- 46)
test linalg::lu_solve::lu_solve_100x100_rulinalg              ... bench:       7,949 ns/iter (+/- 56)
test linalg::lu_solve::lu_solve_100x100_ndarray_linalg        ... bench:      15,142 ns/iter (+/- 231)

test linalg::lu_solve::lu_solve_200x200_na                    ... bench:      15,681 ns/iter (+/- 132)
test linalg::lu_solve::lu_solve_200x200_na_lapack             ... bench:      15,774 ns/iter (+/- 208)
test linalg::lu_solve::lu_solve_200x200_rulinalg              ... bench:      28,211 ns/iter (+/- 309)
test linalg::lu_solve::lu_solve_200x200_ndarray_linalg        ... bench:      60,412 ns/iter (+/- 662)

test linalg::lu_solve::lu_solve_500x500_na                    ... bench:      95,498 ns/iter (+/- 1,169)
test linalg::lu_solve::lu_solve_500x500_na_lapack             ... bench:      95,515 ns/iter (+/- 1,155)
test linalg::lu_solve::lu_solve_500x500_rulinalg              ... bench:     158,737 ns/iter (+/- 1,790)
test linalg::lu_solve::lu_solve_500x500_ndarray_linalg        ... bench:     426,721 ns/iter (+/- 6,238)

test linalg::qr::qr_100x100_na                                ... bench:     242,954 ns/iter (+/- 1,369)
test linalg::qr::qr_100x100_rulinalg                          ... bench:     244,251 ns/iter (+/- 2,808)
test linalg::qr::qr_100x100_na_lapack                         ... bench:     361,008 ns/iter (+/- 172,924)
test linalg::qr::qr_100x100_linxal                            ... bench:     388,214 ns/iter (+/- 164,201)

test linalg::qr::qr_100x500_na_lapack                         ... bench:     797,860 ns/iter (+/- 48,551)
test linalg::qr::qr_100x500_linxal                            ... bench:   1,087,715 ns/iter (+/- 95,843)
test linalg::qr::qr_100x500_rulinalg                          ... bench:   1,559,654 ns/iter (+/- 6,280)
test linalg::qr::qr_100x500_na                                ... bench:   1,817,884 ns/iter (+/- 11,012)

test linalg::qr::qr_200x200_na_lapack                         ... bench:   1,303,436 ns/iter (+/- 421,335)
test linalg::qr::qr_200x200_linxal                            ... bench:   1,440,121 ns/iter (+/- 202,860)
test linalg::qr::qr_200x200_na                                ... bench:   1,768,203 ns/iter (+/- 3,978)
test linalg::qr::qr_200x200_rulinalg                          ... bench:   1,904,316 ns/iter (+/- 818,390)

test linalg::qr::qr_500x100_na_lapack                         ... bench:     698,218 ns/iter (+/- 37,420)
test linalg::qr::qr_500x100_linxal                            ... bench:     789,112 ns/iter (+/- 363,697)
test linalg::qr::qr_500x100_na                                ... bench:   1,543,158 ns/iter (+/- 79,002)
test linalg::qr::qr_500x100_rulinalg                          ... bench:   2,216,709 ns/iter (+/- 8,704)

test linalg::qr::qr_500x500_na_lapack                         ... bench:  10,010,074 ns/iter (+/- 1,100,425)
test linalg::qr::qr_500x500_linxal                            ... bench:  11,834,180 ns/iter (+/- 527,032)
test linalg::qr::qr_500x500_na                                ... bench:  27,679,999 ns/iter (+/- 7,582,011)
test linalg::qr::qr_500x500_rulinalg                          ... bench:  31,469,016 ns/iter (+/- 8,874,295)

test linalg::qr::qr_unpack_100x100_rulinalg                   ... bench:     490,514 ns/iter (+/- 9,268)
test linalg::qr::qr_unpack_100x100_na                         ... bench:     497,107 ns/iter (+/- 17,194)
test linalg::qr::qr_unpack_100x100_na_lapack                  ... bench:     774,740 ns/iter (+/- 14,018)
test linalg::qr::qr_unpack_100x100_ndarray_linalg             ... bench:     794,938 ns/iter (+/- 9,701)

test linalg::qr::qr_unpack_100x500_na_lapack                  ... bench:   1,299,302 ns/iter (+/- 27,806)
test linalg::qr::qr_unpack_100x500_rulinalg                   ... bench:   1,883,853 ns/iter (+/- 10,275)
test linalg::qr::qr_unpack_100x500_na                         ... bench:   2,049,818 ns/iter (+/- 19,433)

test linalg::qr::qr_unpack_200x200_na_lapack                  ... bench:   2,785,583 ns/iter (+/- 27,946)
test linalg::qr::qr_unpack_200x200_ndarray_linalg             ... bench:   2,908,825 ns/iter (+/- 58,565)
test linalg::qr::qr_unpack_200x200_na                         ... bench:   3,599,707 ns/iter (+/- 7,692)
test linalg::qr::qr_unpack_200x200_rulinalg                   ... bench:   3,735,653 ns/iter (+/- 6,559)

test linalg::qr::qr_unpack_500x100_na_lapack                  ... bench:   1,386,366 ns/iter (+/- 20,943)
test linalg::qr::qr_unpack_500x100_na                         ... bench:   3,030,160 ns/iter (+/- 6,467)
test linalg::qr::qr_unpack_500x100_rulinalg                   ... bench:  18,009,191 ns/iter (+/- 1,054,516)

test linalg::qr::qr_unpack_500x500_na_lapack                  ... bench:  21,821,651 ns/iter (+/- 2,790,178)
test linalg::qr::qr_unpack_500x500_ndarray_linalg             ... bench:  23,223,414 ns/iter (+/- 2,969,045)
test linalg::qr::qr_unpack_500x500_na                         ... bench:  55,789,495 ns/iter (+/- 1,218,810)
test linalg::qr::qr_unpack_500x500_rulinalg                   ... bench:  62,808,155 ns/iter (+/- 11,393,312)

test linalg::svd::svd_100x100_na                              ... bench:   3,522,536 ns/iter (+/- 6,127)
test linalg::svd::svd_100x100_na_lapack                       ... bench:   6,166,798 ns/iter (+/- 32,125)
test linalg::svd::svd_100x100_linxal                          ... bench:   6,190,673 ns/iter (+/- 65,828)
test linalg::svd::svd_100x100_ndarray_linalg                  ... bench:  29,552,476 ns/iter (+/- 1,950,301)
test linalg::svd::svd_100x100_rulinalg                        ... bench:  42,030,760 ns/iter (+/- 213,608)

test linalg::svd::svd_200x200_na_lapack                       ... bench:  16,652,698 ns/iter (+/- 1,821,621)
test linalg::svd::svd_200x200_linxal                          ... bench:  17,208,785 ns/iter (+/- 894,328)
test linalg::svd::svd_200x200_na                              ... bench:  26,564,185 ns/iter (+/- 40,326)
test linalg::svd::svd_200x200_ndarray_linalg                  ... bench: 162,832,632 ns/iter (+/- 10,913,980)
test linalg::svd::svd_200x200_rulinalg                        ... bench: 464,785,636 ns/iter (+/- 4,583,386)

Here are some simple benchmarks to convince myself that nalgebra is as fast in release mode as the SIMD cgmath:

test lowdim::inverse::mat2_inverse_na                         ... bench:          10 ns/iter (+/- 0)
test lowdim::inverse::mat2_inverse_cgmath                     ... bench:          10 ns/iter (+/- 0)

test lowdim::inverse::mat3_inverse_na                         ... bench:          26 ns/iter (+/- 0)
test lowdim::inverse::mat3_inverse_cgmath                     ... bench:          26 ns/iter (+/- 0)

test lowdim::inverse::mat4_inverse_na                         ... bench:          48 ns/iter (+/- 0)
test lowdim::inverse::mat4_inverse_cgmath                     ... bench:          57 ns/iter (+/- 0)

test lowdim::product::mat2_mul_mat2_na                        ... bench:           2 ns/iter (+/- 0)
test lowdim::product::mat2_mul_mat2_cgmath                    ... bench:           2 ns/iter (+/- 0)

test lowdim::product::mat2_mul_vec2_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::mat2_mul_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)

test lowdim::product::mat3_mul_mat3_na                        ... bench:           9 ns/iter (+/- 0)
test lowdim::product::mat3_mul_mat3_cgmath                    ... bench:           9 ns/iter (+/- 0)

test lowdim::product::mat3_mul_vec3_na                        ... bench:           3 ns/iter (+/- 0)
test lowdim::product::mat3_mul_vec3_cgmath                    ... bench:           3 ns/iter (+/- 0)

test lowdim::product::mat4_mul_mat4_na                        ... bench:          14 ns/iter (+/- 0)
test lowdim::product::mat4_mul_mat4_cgmath                    ... bench:          14 ns/iter (+/- 0)

test lowdim::product::mat4_mul_vec4_na                        ... bench:           4 ns/iter (+/- 0)
test lowdim::product::mat4_mul_vec4_cgmath                    ... bench:           4 ns/iter (+/- 0)

test lowdim::product::vec2_dot_vec2_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec2_dot_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)

test lowdim::product::vec3_dot_vec3_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec3_dot_vec3_cgmath                    ... bench:           0 ns/iter (+/- 0)

test lowdim::product::vec4_dot_vec4_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec4_dot_vec4_cgmath                    ... bench:           0 ns/iter (+/- 0)
astraw commented 7 years ago

Exciting.

I confirm the tests in the nalgebra-lapack crate in the linalg branch passes tests on my Mac with:

cargo test --no-default-features --features accelerate

I’m perfectly happy to transfer the name nalgebra-lapack to you on crates.io — I’ll just have to figure out how to do that. I would appreciate some credit in the Cargo.toml and README.md of the nalgebra-lapack crate.

milibopp commented 7 years ago

Great work!

Looking forward to the new release.

sebcrozet commented 7 years ago

@astraw Thanks for checking that it works on your Mac!

I’m perfectly happy to transfer the name nalgebra-lapack to you on crates.io — I’ll just have to figure out how to do that.

Thanks, I guess you can follow the instructions given on the cargo documentation to add me as named owner.

I would appreciate some credit in the Cargo.toml and README.md of the nalgebra-lapack crate.

Of course! You are now added to the nalgebra-lapack authors list + I mentioned you on the README there.

astraw commented 7 years ago

OK, thanks.

I just added you as an owner on crates.io. I am unable to remove myself as an owner, so I would ask you to do so.

I will try to test this further in the coming days.

masonium commented 7 years ago

Thanks for all the work on the benchmarking!

linxal actually has a .symmetric_eigenvalues method that can be used when the matrix is know to be symmetric / Hermitian. When I use that method, it becomes the fastest for the eigen benchmark on my machine.

I do think the eigenvalue benchmark is problematic, partially for that reason. The other libraries either don't handle complex eigenvalues at all (nalgebra), don't guarantee correct results for non-symmetric matrices (rulinalg), or don't specialize for symmetric matrices (nalgebra-lapack). linxal is the only library you test that correctly compute all (real and complex) eigenvalues for generic matrices and has a specialization for the symmetric case.

I would recommend having separate symmetric and non-symmetric benchmarks.

linxal suffers across the board because ndarray creates c-order matrices by default. linxal handles c-order or f-order matrices just fine, but LAPACK internally just transposes any c-order input to f-order, does the calculation, and transposes it back. If you change the ndarray matrices to f-order, you'll see a performance boost across the board, bringing it roughly inline to nalgebra-lapack numbers. (nalgebra-lapack remains slightly faster, since the c-interface still has some overhead that nalgebra-lapack bypasses by using the fortran interface directly.) I don't think it's a necessarily a flaw in your methodology; if a random user picks up linxal and doesn't know to use (or can't use) f-order matrices, it's not their fault. It might be useful, though, to add a linxal_f category (that does all the same tests with f-order matrices) to highlight the discrepancy. (I'd be happy to submit a pull request to that effect).

sebcrozet commented 7 years ago

@masonium Thanks for your feedback. I understand your remarks. I did not check if ndarray could create column-major matrices. So I think adding linxal_column_major versions (instead of linxal_f for clarity) will be useful indeed.

I do think the eigenvalue benchmark is problematic, partially for that reason. The other libraries either don't handle complex eigenvalues at all (nalgebra), don't guarantee correct results for non-symmetric matrices (rulinalg), or don't specialize for symmetric matrices (nalgebra-lapack). linxal is the only library you test that correctly compute all (real and complex) eigenvalues for generic matrices and has a specialization for the symmetric case. I would recommend having separate symmetric and non-symmetric benchmarks.

I agree the eigenvalue benchmark is problematic. We should have spearate symmetric and non-symmetric benchmarks, as your propose.

(I'd be happy to submit a pull request to that effect).

PR addressing any of those issues (sebcrozet/rust_linalg_bench#1, sebcrozet/rust_linalg_bench#2) are very welcome!

Andlon commented 7 years ago

Great work, @sebcrozet! I'm very happy to see progress in pure rust implementations of linear algebra algorithms. I haven't taken a close look at the benchmark code, but those numbers look reasonable so I trust you've made a fair comparison with rulinalg :-)

brendanzab commented 7 years ago

Also trusting you with the benchmark comparisons with cgmath, but I probably would ask that you mention that debug builds are considerably slower when compared to cgmath's explicit simd. Not surprised to see our mat4_inverse doing poorly - we haven't really done much work optimising it!

Have you thought of comparing it to C++ libs like glm? Also wondering if we should have some git repo for these benchmarks, like we have for entity component systems: https://github.com/lschmierer/ecs_bench

sebcrozet commented 7 years ago

I probably would ask that you mention that debug builds are considerably slower when compared to cgmath's explicit simd.

That's true. Perhaps we should specify on the README that comparative performances may vary in debug mode (given SIMD cgmath vs. nalgebra as an example).

Not surprised to see our mat4_inverse doing poorly - we haven't really done much work optimising it!

To be honest, the 4d matrix inverse of nalgebra is a copy of the source code of Mesa (also published under the MIT license). This is actually the only piece of code on nalgebra that is copied from another library.

Have you thought of comparing it to C++ libs like glm?

No, I did not. I'd be curious to compare with Eigen as well.

Also wondering if we should have some git repo for these benchmarks, like we have for entity component systems: https://github.com/lschmierer/ecs_bench

Here is the repository I currently use: https://github.com/sebcrozet/rust_linalg_bench

bluss commented 7 years ago

Have you looked at the effect of using -C target-cpu=native or something equivalent? Anything using matrixmultiply should benefit from that. Just that it would be cool to see pure Rust close the gap a bit, if possible. (Ok, I can't say off hand if that crate is even relevant, but the command line flag might still be.)

sebcrozet commented 7 years ago

@bluss Just tried it, and using -C target-cpu=native does not seem to affect the performances significantly.

Andlon commented 7 years ago

I expect you would only see a real benefit with -C target-cpu=native with BLAS-3 implementations of the decompositions, as BLAS-2 is memory-bound for any reasonably sized matrix. I guess that would be the next step though :-)