LinearAlgebra.norm(x, n) should be recursive?

WalterMadelim commented 2 weeks ago

The function LinearAlgebra.norm should be calculated in a global way. e.g. the 1-norm:

julia> function norm1(x) return LinearAlgebra.norm(x, 1) end
norm1 (generic function with 1 method)

julia> t = ([1 1; 0 0.], [0 0; 0 0.])
([1.0 1.0; 0.0 0.0], [0.0 0.0; 0.0 0.0])

julia> t[1]
2×2 Matrix{Float64}:
 1.0  1.0
 0.0  0.0

julia> t[2]
2×2 Matrix{Float64}:
 0.0  0.0
 0.0  0.0

julia> norm1(t) # this is strange because norm-2 feature is involved in a norm-1 function
1.4142135623730951

I think in practice people should expect the last result being 2.0, because we are calculating 1-norm.

giordano commented 2 weeks ago

What do you expect the norm of a tuple of matrices to be? I'm going to guess you'd like https://github.com/JuliaLang/julia/blob/1edc6f1b7752ed67059020ba7ce174dffa225954/stdlib/LinearAlgebra/src/generic.jl#L532 to be

generic_norm1(x) = mapreduce(float ∘ norm1, +, x)

?

WalterMadelim commented 2 weeks ago

Yes. As long as the it is a data structure containing a bunch of Float64's, it is proper to devise such a function. We first collect all the Float64s' in a vector, and then calculate the norm of that vector, like this manner. The following is an easy demo.

julia> import LinearAlgebra

julia> norm1(x) = LinearAlgebra.norm(x, 1)
norm1 (generic function with 1 method)

julia> ip(x, y) = LinearAlgebra.dot(x, y)
ip (generic function with 1 method)

julia> t1 = ([1 -7; 0 0.],  [0 3 5;  -4 0 9.])
([1.0 -7.0; 0.0 0.0], [0.0 3.0 5.0; -4.0 0.0 9.0])

julia> t2 = ([0 2;  0 3.],  [1 -3 2;  1 0 1.])
([0.0 2.0; 0.0 3.0], [1.0 -3.0 2.0; 1.0 0.0 1.0])

julia> inner_product = ip(t1, t2) # this function works as expected ✅
-8.0

julia> difference = t1 .- t2 # this function works as expected ✅
([1.0 -9.0; 0.0 -3.0], [-1.0 6.0 3.0; -5.0 0.0 8.0]) # 📚

julia> # The folloging dist1 meants to be the ||⋅||₁-distance between t1 and t2

julia> dist1 = norm1(difference) # this is currently unexpected  ⚠️
21.158342052791706

julia> generic_norm1(x) = mapreduce(float ∘ norm1, +, x)
generic_norm1 (generic function with 1 method)

julia> dist1_expected = generic_norm1(difference) # This is what it should be like ✅
36.0

julia> @assert 1 + 9 + 3 + 1 + 6 + 3 + 5 + 8 == dist1_expected # compare 📚 line above

stevengj commented 2 weeks ago

The problem with using the 1-norm recursively is that it not as generic.

Because we define $\Vert x \Vert_1 = \sum_i \Vert x_i \Vert$ for an arbitrary collection, via the generic norm(x[i]) of the elements, it works when the elements live in an arbitrary Banach space (i.e. any arbitrary type supporting norm).

If your elements live in a vector space supporting multiple norms, of course you could define many different types of nested norms, and which one you want will depend on your problem. It's not obvious to me that one choice of nesting is mathematically "better" (whatever that means) than others, whereas defaulting to the more-generic choice allows us to use norm(x, 1) on more types of objects.

WalterMadelim commented 2 weeks ago

About the normed linear space: Yes it is a vector space with vector addition and scalar multiplication

julia> t1
([1.0 -7.0; 0.0 0.0], [0.0 3.0 5.0; -4.0 0.0 9.0])

julia> t2
([0.0 2.0; 0.0 3.0], [1.0 -3.0 2.0; 1.0 0.0 1.0])

julia> t1 .+ t2 # vector addition ➕
([1.0 -5.0; 0.0 3.0], [1.0 0.0 7.0; -3.0 0.0 10.0])

julia> 1.5 .* t1 # vector multiplication ❌
([1.5 -10.5; 0.0 0.0], [0.0 4.5 7.5; -6.0 0.0 13.5])

And the 3 criteria of norm is fulfilled. And you can even have inner product as shown above.

About why ||⋅||₁ is preferred to ||⋅||₂: In Mathematical Programming, the former will lead to Linear Programming, whereas the latter will lead to Quadratic Programming. According to our experiences, performance of Linear Programming is always more stable than Quadratic Programming. Therefore, we always want to sidestep using ||⋅||₂ whenever ||⋅||₁ can achieve our goals.

Actually we won't involve too much theoretical math knowledge like functional analysis. The norm is typically used to derive a distance. ( ||⋅||₁ is more favorable ) .+ and .* are also used because they are basic operations in vector space, which is the essential environment for convex analysis. And I believe convex analysis and convex optimization is widely studied.

And inner product is also extremely useful in that a linear constraint can be described conveniently by it. For example, this line is from my code

JuMP.@constraint(ø, [r = 1:R2], o2 >= cnV2[r] + ip(px2V2[r], x2) + ip(pβ2V2[r], β2))

Here the ip is the LinearAlgebra.dot function and the type of x2 is

julia> typeof(x2)
Tuple{Tuple{Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}, Matrix{Float64}}

It works fine in my experiments.

But Anyway, we won't calculate a norm that is intra-norm1 but inter-norm2. (This is inconsistent ⚠️)

stevengj commented 2 weeks ago

There's no question that the L1 norm is very useful in many contexts. So is the L2 norm. An the L-infinity norm. No single choice is the "best" one in all contexts. When you nest norms (e.g. for a vector of vectors), any combination is still a valid norm.

In a specific application with nested vector spaces, you can use whatever combination of these you want. You may just need to write your own 1-line function like mynorm1(x) = mapreduce(x -> norm(x, 1), +, x) to specify the desired nesting.

As I said, the basic issue here is that we would expect any type that implements a normed vector space to implement norm(x), but not necessarily norm(x,1). Hence the former is more generic when used recursively on the elements of a collection.

Changing this would be a breaking change, hence it will not happen in Julia 1.x (see PSA: Julia is not at that stage of development any more). And we would need a very strong reason to change it in 2.x (more than just "I expected this function to behave differently"). We could add an additional parameter, like norm(x,1, recursive=true), but I'm skeptical of the utility of that vs. something more explicit like mapreduce(x -> norm(x, 1), +, x) … since we can't change the default, it won't help with what people "expect", nor will a keyword give us all possible nestings as an option.

WalterMadelim commented 2 weeks ago

mapreduce is good. I've learned that. 👍

JuliaLang / LinearAlgebra.jl

LinearAlgebra.norm(x, n) should be recursive? #1105