decomposition and factorization terminology

Sacha0 commented 6 years ago

Singular value decomposition factorization (svdfact) is a slightly unfortunate name; minor a thing as it is, the redundancy chafes. Perhaps svdecomp, svfact, or something similar would be better? Best!

antoine-levitt commented 6 years ago

The usual acronyms are LU, QR, SVD. Unfortunate perhaps but well established, so it would be better to have consistency across these factorization rather than consistency of the name itself. Reading svdecomp for instance makes me wonder what an ecomp is.

ViralBShah commented 6 years ago

It never occurred to me, but now that you mention it, it does sound weird. However, I don't ever expand the SVD in my mind when I think about it, and perhaps it is ok to let it be as it is.

Sacha0 commented 6 years ago

A broader thought in the same vein: At the moment we mix the terms decomposition and factorization (e.g. Computes the eigenvalue decomposition of A, returning an Eigen factorization object F [...]). Decomposition is the more common and slightly more general term. Perhaps we should use decomposition consistently? Best!

ViralBShah commented 6 years ago

I believe there was a discussion thread on which name to use, and we picked factorizations early on. We should dig up and link the original discussion at the very least.

Sacha0 commented 6 years ago

A bit of git spelunking revealed the following history: Miles Lubin introduced the name lufact confined to an UMFPackLU wrapper via https://github.com/JuliaLang/julia/commit/53c2d3054af43d2075dcc70f7d2f811e519f986f. Later, Doug Bates introduced *d names, e.g. lud and qrd, for decompositions broadly via #1281 and #1290. In #1281, Viral pointed out that the *d names were opaque, and suggested extending the UMFPackLU lufact to *fact/ "factorization" generally instead. Shortly thereafter, Tim Holy suggested *dcmp/"decomposition" as an alternative, and Doug Bates expressed remorse for introduction of "factorization" terminology and a preference for *decomp/"decomposition", but said he'd go with either decision. Viral responded saying he would be happy with either name and left the call to Doug, though he liked *fact's brevity. Mike shared a little support for "decomposition" then, and Doug likewise. That's where the conversation appears to leave off. Viral later committed https://github.com/JuliaLang/julia/commit/69e407b00c62a2c81327d0625b2a7fa6cb83aeeb, renaming the *d functions to *fact, and here we find ourselves :).

Out of curiosity, I checked the number of google hits for "X decomposition" and "X factorization", and while I had the impression that decomposition was the more widespread term, the degree to which that appears true surprised me; results in millions below:

Update regarding the table below: These hit counts were for unquoted search queries, whereas quoted search queries are probably a better metric. With quoted queries, which term hit counts favor depends on the decomposition, and the results are much less compelling overall. Ref. https://github.com/JuliaLang/julia/issues/26995#issuecomment-390808271.

X	decomp	fact
lu	15.3	1.07
qr	12.3	0.34
singular value	2.58	0.68
eigen	0.41	0.08
cholesky	0.4	0.2
schur	0.28	0.43

Tangentially, the history suggests that the only reason for the *fact/*d names was to retain MATLAB compatibility in lu, qr, and friends. But with the MATLAB-like functions lu, qr, et al now being deprecated in favor of *fact, deprecating the *fact names to lu, qr, et al becomes possible in 1.x (discussed briefly in https://github.com/JuliaLang/julia/pull/25187). Best!

ViralBShah commented 6 years ago

Thank you for that detailed analysis!

StefanKarpinski commented 6 years ago

Yes, I'm very much in favor of making breaking changes to LinearAlgebra 2.0 in some Julia 1.x release where the names are just lu, schur, chol, etc. but the objects returned are factorization objects. We can retain the ability to write code like L, U = lu(X) but defining iteration of the factorization objects to yield the expected components. Let's spend the intervening time thinking about what the best design for this kind of API would be without any historical baggage.

Sacha0 commented 6 years ago

Yes, I'm very much in favor of making breaking changes to LinearAlgebra 2.0 in some Julia 1.x release where the names are just lu, schur, chol, etc. but the objects returned are factorization objects. We can retain the ability to write code like L, U = lu(X) but defining iteration of the factorization objects to yield the expected components.

Agreed! And likewise Andreas it seems. #26997 should at least set us up for those changes during 1.x, and potentially non-breaking then. Best!

ViralBShah commented 6 years ago

I am not sure how these Google hits were computed, but I don't get anything above 150,000-ish on anything, and even so, no more than 13-14 pages of results.

Sacha0 commented 6 years ago

I am not sure how these Google hits were computed, but I don't get anything above 150,000-ish on anything, and even so, no more than 13-14 pages of results.

The difference is quoting versus not quoting the search query :).

ViralBShah commented 6 years ago

I think one ought to quote it, which is what I thought you did since you did say "X decomposition" and "X factorization". Even if I do it without quotes for lu, I get the same numbers roughly, about 1.2-1.3M, and not 15.3M vs. 1M. I don't think Google hits are a reliable way to decide this.

Sacha0 commented 6 years ago

A slack conversation convinced me that the hit counts for quoted search queries are a better metric, and for such queries which term the hit count favors depends on the particular decomposition; in other words, ignore the table above, as it's probably not the best guide. The remaining question is a minor one of correctness, in that e.g. decomposition is in some cases perhaps more correct for eig than factorization, but whether that's worth bothering about 🤷‍♂️. Best!

StefanKarpinski commented 6 years ago

I would point out that while the eigenvectors and eigenvalues are not a factorization as a pair—you can’t multiply them and get the original matrix back—the factorization object does act as a true factorization in that you can use it in place of the original matrix as “pre-factorized” stand in. Moreover, you can get one of these objects through a funcrion called, yes, factorize, not decompose.

Sacha0 commented 6 years ago

A little further slack triage settled on the status quo, i.e. retaining factorize/Factorization (and I imagine consequently continuing to use somewhat mixed decomposition/factorization terminology). Best!

o314 commented 6 years ago

First is imho, i have spent thousand of hours working on math, engineering and ontology, terminology call it like you want.

Factorization is grounded into arithmetics. when the matrix field is numerical, frequently factorization pops here and there.

Composition is more compatible with the evolution toward symbolic programming.

it's a natural movement found when trying to solve equation (math work) or assemblying things (engineering work). like with dynamic programming, we break a problem in multiple piece and with the property of the zero element / absorbing element of a groupw we solve the whole constraining either one part or the other thanks to the law of excluded middle

secondly, some facts

wolfram is matrix decomposition everywhere.

and google scholar too when you go to symbolic computing

search	hits
"category decomposition"	278
"category factorization"	27
"graph decomposition"	7880
"graph factorization"	700
"ideal decomposition"	1170
"ideal factorization"	795
"lattice decomposition"	731
"lattice factorization"	261
"monad composition"	108
"monad decomposition"	7
"monad factorization"	1

bonus guess how to check the pantelides thing

IMHO julia in the large is better served with decomposition than factorization. in linear algebra, factorization may however remain more commons.

andreasnoack commented 6 years ago

Can this be closed now that the factorization functions no longer have fact in their names or would people like to discuss this topic further?

fredrikekre commented 6 years ago

I guess what's left is to decide if we should rename Factorization to Decomposition.

andreasnoack commented 6 years ago

I see. Even if decomposition was slightly better than factorization (which I don't think) then it's not worth the name change.

StefanKarpinski commented 6 years ago

After extensive discussion we concluded that the two words are used roughly as frequently when talking about matrices but that "factorization" is much more matrix-specific and thus conveys more information. It's also the one we're already using and it's no longer very user-facing, so we do nada.

JuliaLang / julia

decomposition and factorization terminology #26995