JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.93k stars 5.49k forks source link

0.5 Adding matrices of special type incorrectly promoted to Matrix{Any} #17254

Closed dlfivefifty closed 8 years ago

dlfivefifty commented 8 years ago
abstract AbstractFoo
immutable Foo <: AbstractFoo end
import Base.+
+(::Foo,::Foo) = Foo()

A=fill(Foo(),10,10)
A+A

returns a Matrix{Any}, not Matrix{Foo} as expected.

Julia Version 0.5.0-dev+5122
Commit 2e1bcc6 (2016-07-03 00:01 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.5.0)
  CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libgfortblas
  LAPACK: liblapack
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)
pkofod commented 8 years ago

I had a version from 1st of July, and that didn't have that error, so it must have come in the last two days,

martinholters commented 8 years ago

As part of JuliaLang/julia#16995, the output type is not longer guessed to be equal to the input type, so at the moment, you need to provide your own promote_op method to make this work.

pkofod commented 8 years ago

Should maybe have had a mention of this in NEWS? @nalimilan

nalimilan commented 8 years ago

As part of JuliaLang/julia#16995, the output type is not longer guessed to be equal to the input type, so at the moment, you need to provide your own promote_op method to make this work.

I don't think that's the plan. The idea is that we should use the same algorithm as in broadcast to automatically compute the return type. broadcast(+, A, A) correctly returns an Array{Foo} already.

@JeffBezanson Is that correct?

nalimilan commented 8 years ago

In the meantime, note that if you declare Foo <: Number and provide a one(::Type{Foo}) method, rules are already present to choose the correct return type for all operators.

wbhart commented 8 years ago

Can you provide a link to the piece of code where these rules are defined for numbers? We can't make our types belong to Number, since the top of our abstract type hierarchy is Set, and obviously sets are not always sets of numbers.

By the way, one(::Type{Foo}) is really not the right thing to ask for. You should ask for one(::Foo), since the former will not work for types like the integers modulo n, since it is not possible to create 1 mod n from its type alone, since you need to know what n is. That's only possible if you have another value modulo n to play with, since you can look inside it to see what the n is supposed to be. A lot of the Julia matrix routines fail for us because of this.

martinholters commented 8 years ago

Can you provide a link to the piece of code where these rules are defined for numbers?

https://github.com/JuliaLang/julia/blob/master/base/number.jl#L72

By the way, one(::Type{Foo}) is really not the right thing to ask for. You should ask for one(::Foo)

But promote_op is only given the type, so this would require a major overhaul.

nalimilan commented 8 years ago

It's here: https://github.com/JuliaLang/julia/blob/6ab1b197f42807362c77893ea112fbaaad510785/base/number.jl#L73

That rule was already present in 0.4, what's new is that we no longer assume that the return type of an operation is the same as it's input type (which isn't correct in general).

I don't understand what you mean by "the top of our abstract type hierarchy is Set, and obviously sets are not always sets of numbers." Is Foo an AbstractSet? Or do you want to store Foo inside Set? Anyway, you can simply override Base.promote_op for now if you need to support the 0.5 development version immediately.

Regarding integers modulo n, why don't you make n a type parameter? Or can't you return a dummy value for n (like 0 or 1) which would be dropped in favor of the modulo used by the other argument in operations?

wbhart commented 8 years ago

Thanks for the link.

Just to clarify, we hit this issue in our Nemo package, unrelated to the package maintained by the OP. It currently prevents our documentation from building, since documentation is built on the nightly build. Which is a shame, since I am currently rewriting our documentation.

What I mean is that we have an abstract type called Set, which sits at the top of our type hierarchy, i.e. FlintIntegerRing <: Nemo.Ring <: Nemo.Group <: Nemo.Set. (Our type FlintIntegerRing is the equivalent of Foo in the OP's example). So we can't just make our types belong to Number, since that doesn't make any sense.

Regarding integers modulo n, you can't make n a type parameter. Among the reasons:

1) n is a bignum, not an Int 2) multimodular algorithms can use thousands or millions of different primes n in a calculation. If n was a type parameter, the Julia compiler would recompile every function used for every different value of n, which would take months for a computation that is supposed to take seconds. 3) Many Julia developers have told us not to do this.

You can't put a dummy value for n, obviously. What happens if you get two of them? What should the operation do then? And when will the correct information get filled in? How would we ensure the integrity of our data? Or what happens if the algorithm needs to compute -a mod n. How will it compute the correct residue -a if it doesn't know n? E.g. if I give you s = 3 mod something and ask for -s, what is the value that should be stored for the residue?

It's clear upon reflection that there are only two options for implementing integers mod n. 1) put n in the type, 2) put n in the actual values. Since the former doesn't work, for the reasons outlined above, there is only one workable way of implementing integers mod n. And that has implications if you are implementing say a matrix determinant algorithm (which should work just fine over Z/nZ, so long as it doesn't do any divisions, e.g. using clow sequences or the charpoly method, etc).

There really is only one correct way of doing this sort of thing. Someone already noticed this, and Julia defines things like zero(1) so that this is possible. But the matrix code in Julia doesn't use it systematically, it seems. We are pretty much unable to use any of the sophisticated matrix algorithm implementations in Julia for this reason (determinant, multiplication, you name it). Anyway, it's off topic here. I am just mentioning it to raise awareness every time I see it.

nalimilan commented 8 years ago

What I mean is that we have an abstract type called Set, which sits at the top of our type hierarchy, i.e. FlintIntegerRing <: Nemo.Ring <: Nemo.Group <: Nemo.Set. (Our type FlintIntegerRing is the equivalent of Foo in the OP's example). So we can't just make our types belong to Number, since that doesn't make any sense.

OK. Since multiple inheritance isn't supported yet, the best solution is to declare promote_op methods for each supported operation. But AFAICT this should only be temporary, so you may want to wait for the final 0.5 release to avoid wasting your time.

I can't really comment on the best implementation of integers modulo n, but it could be problematic for a Number to be unable to zero(::Type) and one(::Type). I guess you should try to get rid of their uses in Base and see whether it's workable (promote_op isn't a totally legitimate use of these IMHO, so that's not the main problem).

wbhart commented 8 years ago

[OT] I don't see any problem with Number requiring zero(::Type) and one(::Type). It just means integers mod n are not Numbers. The problem of course is that you want to put things that are not numbers in matrices. Numbers are only one very tiny part of algebra. You can define matrices over any ring (at least). The standard example I use to illustrate why the current Julia model breaks is the integers mod n (which is a ring, even if it is not a ring of numbers). It's not a perfect example, since you could use n = 0 as you suggest and then write a pile of logic to handle all the cases where n = 0. But there are other more complex examples in algebra where there is no canonical value you can use to indicate missing information, for example quotient rings, i.e. a ring R modulo an ideal n. Here every ideal is a valid value for n, so there is no equivalent of n = 0 that you could use to indicate missing information. The problem becomes intractable if you are trying to write a generic system for abstract algebra (as we are).

andreasnoack commented 8 years ago

@wbhart Wouldn't the best solution be to define promote_op in for the Nemo types?

wbhart commented 8 years ago

@andreasnoack We'll wait to see how the issue gets resolved. It looks to me like people are suggesting this won't be required for the final 0.5 release.

wbhart commented 8 years ago

@martinholters Yes, this is another example of the same problem I mention in my "off topic" bit. We already define our own methods for promoting the "type" of abstract algebraic values when doing arithmetic operations in Julia, since promote_rule and friends are only given the type, not the value, so that doesn't work for us. The type based system clearly can't work in general. It's fine for numerical mathematics, but no use for any other kind of maths.

It shouldn't cause any performance issues to pass the values instead of the types to functions like this. For all the numerical types present in Julia it would then just default to calling the current type based functions, and the compiler should just optimise this out to exactly the same assembly instructions as at present. But at least people not doing numerical work could overload the value based methods directly.

wbhart commented 8 years ago

@martinholters Sorry my previous post was confused. Let me be more specific.

If I'm interpreting the documentation correctly, promote_op is a replacement for promote_rule on a per operation basis. I assume this is correct?

In fact, the problem for us is probably not with promote_rule and friends. The problem is actually just that we can't rely on promote_rules/promote_ops and convert functions to do our promotions for us when doing arithmetic operations.

The reason is that convert does not have enough information to actually do a conversion. For example, in adding an integer mod n to an integer, convert would only have the type of the first argument and the value of the second, whereas it really needs the value of both in order to do the conversion (since the modulus n is in the value, not the type of the first argument).

The only way around it is for us to define promote(xs..) directly for every possible combination of types that we define, as the documentation recommends we should not do [1].

So the problem actually isn't with promote_rule and promote_op as such, as far as I can see, but with convert. And in any case, as the type of the result should not depend on which operation has been performed, in our case, we would be using promote_rule, not promote_op, I think (though there are doubtlessly cases where we will need the latter, eventually).

What I am still confused about is this. We don't use promote_op, and we have promote_rules for all our promotions. Why when adding two matrices together where all entries in both matrices are of the same type, is it not using the promote_rule we defined to determine the type? Why would it even check the promote_op, since we never defined one. Something seems definitely broken if I am now understanding how this is supposed to work.

[1] http://docs.julialang.org/en/latest/manual/conversion-and-promotion/

wbhart commented 8 years ago

Wowsers: the implementation of promote_op has the following:

function promote_op{R<:Number,S<:Number}(op, ::Type{R}, ::Type{S})
    T = typeof(op(one(R), one(S)))
   ...

It actually adds two values of the given type to work out the type of the result!? How is that efficient if say the two types are for thousand by thousand matrices?

How does it even work at all for Julia matrix types, where the numbers of rows and columns are not even given in the types? How can it even create the identity matrix so that it can do this? No wonder this broke our code.

Isn't there at least a way to look up the return type of a given method in Julia and use that? Constructing actual values and actually applying the op (assuming I'm reading the code correctly) seems like an inefficient way to do this!

KristofferC commented 8 years ago

Have you looked at the generated code? Is it not being constant folded?

wbhart commented 8 years ago

@KristofferC I'm sure it's possible to constant fold it in case the type R and S are double precision floats or ints. But what if one(R) and one(S) have to call a C library to construct the value. There could be side effects from such a call, so clearly the compiler can't constant fold. This is our situation by the way. (Well, sort of. Actually, our types don't belong to Number. But BigInts do. So that would be an example where this happens in Julia itself.)

nalimilan commented 8 years ago

@wbhart These are only fallbacks, so there's absolutely no problem if they aren't efficient for your type. There would only be a problem if there was no way to escape the default behavior, but that's really not the case. That's how multiple dispatch is used in Julia: for example, we have a lot of generic methods for AbstractArray which are fast for standard arrays and very slow for sparse matrices, but most of the time we also provide more efficient specialized versions.

What's annoying currently is that you need to define both promote_rule and promote_op methods, while inference could replace the latter in most (all?) cases. I'll leave @JeffBezanson explain the exact plan in that regard.

dlfivefifty commented 8 years ago

@wbhart why not just override +(::Matrix{Foo},...) directly, then you can use one(::Foo) instead of one(::Type{Foo})?

Sent from my iPhone

On 5 Jul 2016, at 08:43, Milan Bouchet-Valat notifications@github.com wrote:

@wbhart These are only fallbacks, so there's absolutely no problem if they aren't efficient for your type. There would only be a problem if there was no way to escape the default behavior, but that's really not the case. That's how multiple dispatch is used in Julia: for example, we have a lot of generic methods for AbstractArray which are fast for standard arrays and very slow for sparse matrices, but most of the time we also provide more efficient specialized versions.

What's annoying currently is that you need to define both promote_rule and promote_op methods, while inference could replace the latter in most (all?) cases. I'll leave @JeffBezanson explain the exact plan in that regard.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

wbhart commented 8 years ago

@nalimilan Thanks for the explanation. I've had a closer look at our code and it's exactly the case reported by the OP. We were formerly adding two matrices of type Array{Nemo.fmpq_poly,2} and it used to result in a matrix of type Array{Nemo.fmpq_poly,2}. Now it results in an Array{Any, 2}, which of course is incorrect.

I agree that requiring the user to define both promote_rule and promote_op for all their types where this can happen seems problematic. We don't need the functionality of promote_op over promote_rule for any of our types currently, so it would seem to be very odd to require us to define it (which would be a lot of work).

nalimilan commented 8 years ago

@dlfivefifty That's going to require more code than overriding promote_op, and it wouldn't work for all cases (e.g. broadcast, .+, containers other than Matrix...).

wbhart commented 8 years ago

@dlfivefifty That would work for us. Fortunately I had the foresight to not rely on any of the Julia matrix code for our Nemo matrix type, except for the + and - functions. I reimplemented everything myself, a total of around 3000 lines of code. Writing a single additional + and - function for Nemo matrices will not be a big job.

But this kind of totally defeats the purpose of having a generic matrix implementation in Julia don't you think? I'd prefer to hear it was being pushed out of Base if it was going to be like this, since it is too specialised to be of any general use.

nalimilan commented 8 years ago

But this kind of totally defeats the purpose of having a generic matrix implementation in Julia don't you think? I'd prefer to hear it was being pushed out of Base if it was going to be like this, since it is too specialised to be of any general use.

Writing a few promote_op methods is already much less code than reimplementing all matrix operations. It's kind of redundant with actually writing the scalar operations, but it's still much more generic than what most other fast languages offer.

wbhart commented 8 years ago

@nalimilan We fixed this in Nemo by rewriting the + and - functions so they don't use Julia matrix operations. Certainly just writing a few promote_op methods would not have made the Julia matrix code work for us. We actually needed to reimplement the algorithms we have because the Julia code makes too many assumptions that don't hold in general in algebra.

The interesting thing here is that the changes that have been made to Julia to make it more generic, seem to have been made to support physical units. Physical units are essentially an example of a general algebraic structure. Other examples that Julia will eventually hit are ordinals, dual numbers, quaternions and so on. And as soon as the first amateur number theorist starts trying to do arithmetic modulo n in Julia and starts using matrices, the assumptions currently being made will be fairly obvious.

Anyway, we are ok for now (note that our issues were the same as that of the OP, but our project is different and the same fix may not work there). I just wanted to bring attention to the more general issue again, since it comes up when examining this particular issue.

Sacha0 commented 8 years ago
function promote_op{R<:Number,S<:Number}(op, ::Type{R}, ::Type{S})
     T = typeof(op(one(R), one(S)))
    ...

It actually adds two values of the given type to work out the type of the result!? How is that efficient if say the two types are for thousand by thousand matrices?

How does it even work at all for Julia matrix types, where the numbers of rows and columns are not even given in the types? How can it even create the identity matrix so that it can do this? No wonder this broke our code.

This promote_op fallback does not apply to the cases you identify (matrices) due to the qualification {R<:Number,S<:Number}, no?

Julia defines things like zero(1) so that this is possible. But the matrix code in Julia doesn't use it systematically, it seems. We are pretty much unable to use any of the sophisticated matrix algorithm implementations in Julia for this reason (determinant, multiplication, you name it).

Could you expand? Do you refer to the generic linear algebra code using, e.g., = zero(x) and = 0 interchangeably much of the time? If using zero(x) and one(x) throughout the generic linear algebra code would make that code more generally useful (e.g. allow you to rely on that code in Nemo), improving the code in that regard should be possible. Best!

dlfivefifty commented 8 years ago

I also ran into this in ApproxFun, where the domain that a function is defined on cannot be inferred from the type. My solution was to use NaNs as an indicator of "domain not set"

Sent from my iPhone

On 5 Jul 2016, at 10:12, wbhart notifications@github.com wrote:

@nalimilan We fixed this in Nemo by rewriting the + and - functions so they don't use Julia matrix operations. Certainly just writing a few promote_op methods would not have made the Julia matrix code work for us. We actually needed to reimplement the algorithms we have because the Julia code makes too many assumptions that don't hold in general in algebra.

The interesting thing here is that the changes that have been made to Julia to make it more generic, seem to have been made to support physical units. Physical units are essentially an example of a general algebraic structure. Other examples that Julia will eventually hit are ordinals, dual numbers, quaternions and so on. And as soon as the first amateur number theorist starts trying to do arithmetic modulo n in Julia and starts using matrices, the assumptions currently being made will be fairly obvious.

Anyway, we are ok for now (note that our issues were the same as that of the OP, but our project is different and the same fix may not work there). I just wanted to bring attention to the more general issue again, since it comes up when examining this particular issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andreasnoack commented 8 years ago

@wbhart You will, without doubt, hit cases where these things don't work in base but I've tried to make them work. Sometimes the floating point requirements don't mix well with the pure algebriac requirements but sometimes we are able to make it work.

...Julia code makes too many assumptions that don't hold in general in algebra...Other examples that Julia will eventually hit are

...dual numbers

julia> using DualNumbers

julia> A = [dual(rand(1//1:10, 2)...) for i = 1:4, j = 1:4];

julia> b = [dual(rand(1//1:10, 2)...) for i = 1:4];

julia> @which A\b
\(A::AbstractArray{T<:Any,2}, B::Union{AbstractArray{T<:Any,1},AbstractArray{T<:Any,2}}) at linalg/generic.jl:333

julia> A*(A\b) - b
4-element Array{DualNumbers.Dual{Rational{Int64}},1}:
 0//1 + 0//1ɛ
 0//1 + 0//1ɛ
 0//1 + 0//1ɛ
 0//1 + 0//1ɛ

... quaternions

julia> using Quaternions

julia> A = [Quaternions.Quaternion(rand(4)...) for i = 1:3, j = 1:3];

julia> b = [Quaternions.Quaternion(rand(4)...) for i = 1:3];

julia> @which qrfact(A)
qrfact{T}(A::AbstractArray{T,2}) at linalg/qr.jl:102

julia> b - A*(qrfact(A)\b)
3-element Array{Quaternions.Quaternion{Float64},1}:
  -2.22045e-16 + 2.220446049250313e-16im + 4.440892098500626e-16jm - 1.1102230246251565e-16km
 -4.44089e-16 - 3.3306690738754696e-16im + 3.885780586188048e-16jm + 3.3306690738754696e-16km
                                               0.0 + 0.0im + 0.0jm - 3.3306690738754696e-16km

And as soon as the first amateur number theorist [me?] starts trying to do arithmetic modulo n


julia> immutable MyMod5
       x::Int
       MyMod5(x) = new(mod(x, 5))
       end

julia> import Base: +,-,*,/

julia> (+)(x::MyMod5, y::MyMod5) = MyMod5(x.x + y.x);

julia> (-)(x::MyMod5, y::MyMod5) = MyMod5(x.x - y.x);

julia> (*)(x::MyMod5, y::MyMod5) = MyMod5(x.x * y.x);

julia> Base.inv(x::MyMod5) = x.x == 1 ? MyMod5(1) : (x.x == 2 ? MyMod5(3) : (x.x == 3 ? MyMod5(2) : (x.x == 4 ? MyMod5(4) : throw(DivideError()))));

julia> (/)(x::MyMod5, y::MyMod5) = x*inv(y);

julia> Base.zero(::MyMod5) = MyMod5(0);

julia> Base.zero(::Type{MyMod5}) = MyMod5(0);

julia> Base.one(::Type{MyMod5}) = MyMod5(1);

julia> srand(127);

julia> A = [MyMod5(rand(0:5)) for i = 1:4, j = 1:4];

julia> b = [MyMod5(rand(0:5)) for i = 1:4];

julia> @which A\b
\(A::AbstractArray{T<:Any,2}, B::Union{AbstractArray{T<:Any,1},AbstractArray{T<:Any,2}}) at linalg/generic.jl:333

julia> A*(A\b) == b
true
thofma commented 8 years ago

@andreasnoack And then a new type for Z/nZ for each n? You have not addressed the interesting case where there is no meaningful definition of Base.zero(::Type{MyMod}). Unless you use the modulus as a type parameter. But this is a bad idea. It is the same problem that @dlfivefifty hit: You cannot infer enough information from just the type.

wbhart commented 8 years ago

@Sacha0 You are correct with regard to the qualifications R <: Number and S <: Number, which I missed when I first mentioned this. However, as I later posted, BigInt <: Number (and probably BigFloat <: Number) and this is certainly not what you want to do in those cases.

Regarding zero(x) vs = 0, it is actually zero(x) and one(x) that causes us a problem. Unless x is a value and not a type. Again, please consider the case of integers modulo n. In order to create zero from the type alone is not possible. However, it is possible to create zero modulo n if you have some other value modulo n.

Consider the following code (since many other seem to not get this point):

type intmod r::Int n::Int end

Here r is the residue and n is the modulus, so 5 mod 7 would be represented as intmod(5, 7).

Now consider zero(intmod). How would you write such a function? It's clearly not possible, since it doesn't know what n should be, only what to set r to.

As I explained above, there are many very good reasons (brought to our attention by the Julia community and through our own numerous attempts), why you cannot put the n as part of the intmod type. It has to be part of the value.

Thus defining zero(::Type{intmod}) is not possible, but defining zero(::Intmod) is. The latter can of course default to the former for all the simple types in Julia.

The one issue this introduces is that you need to deal specially with 0xn or nx0 matrices on a per function basis. But this is much less work than trying to hack around the current code, which would be very difficult and unreliable.

To be a bit more specific, the generic Julia matrix multiplication code and determinant code does not work for us for this reason.

wbhart commented 8 years ago

@andreasnoack Thanks for the examples, but I can see from your response that I really wasn't clear. Sorry about that.

What I'm saying is that all of these things are generically speaking, examples of abstract algebraic structures. You will get more and more of them over time. One can either introduce more and more hacks to deal with these as their complexity rises, or one can view them all as examples of the same thing, i.e. abstract algebra, and start considering how to make Julia handle them generically.

In a sense, the promote_op thing is in fact all about making Julia more generic in this regard, so it is not as if people aren't thinking about more general situations. It's just that they are being informed only by very specialised cases, not by the general picture. This just won't scale over the long run.

As thofma points out, a generic implementation of modular arithmetic does not work with the current Julia matrices (or numerous other parts of the system). You are confusing an implementation of modular arithmetic for a single given modulus, for a generic implementation which works for any modulus n, specifically in combination with all of the nice trickery Julia does with promotions, etc., and particularly in the case of matrices. I'm not just waving my hands here. Try it yourself and you will see, it really doesn't work.

I apologise for the long-winded responses. We frequently find ourselves having to explain our particular use case in quite some detail when issues arise that affect us. As a general principle we don't believe in explaining our use case in so much detail that it effectively drives everyone else nuts trying to accommodate it. We are merely trying to raise awareness of the general issues without getting too bogged down in the specifics of our particular implementation (which should be irrelevant).

We think the integers mod n example is a good general example that everyone will easily understand. It is literally the first thing a number theorist would want to implement (other than the integers), and yet it already presents many of the difficulties that one will encounter with much more abstract algebraic mathematical objects. Having said that, it is not a perfect example. Abstract algebra is generically speaking all about various algebraic structures over other algebraic structures.

Integers mod n is an example of a residue ring over the ring of integers. Dual numbers are easy to implement over the complex numbers (or some other scalar numeric types no doubt), and quaternions are similarly trivial to implement over the reals. But think of all these things generically as algebraic constructions over another ring or field, which itself could be generic, then you get more of the required picture.

Rational{T} is another example. Rational numbers are an example of a fraction field over the ring of integers, or more generally a localisation of a ring. (Admittedly the lack of multiple inheritance and traits is what holds back a decent implementation here, and we just have to live with that for now.)

Sacha0 commented 8 years ago

@wbhart, re. https://github.com/JuliaLang/julia/issues/17254#issuecomment-230560638, the shortcomings of zero and one being defined strictly on types were clear before. My interest was in the latter half of the statement

Julia defines things like zero(1) so that this is possible. But the matrix code in Julia doesn't use it systematically, it seems.

with the hope of identifying concrete steps that can be taken to improve the generic linear algebra infrastructure. The preceding statement led me to believe that, even when you define zero and one methods on values of your types rather than on the types themselves, (1) you still cannot use the generic linear algebra infrastructure due to inconsistent use of those zero and one methods in that infrastructure, and (2) perhaps at least that issue could be straightforwardly ameliorated. Could you clarify that statement specifically, to make it actionable?

Certainly just writing a few promote_op methods would not have made the Julia matrix code work for us. We actually needed to reimplement the algorithms we have because the Julia code makes too many assumptions that don't hold in general in algebra.

Could you expand concretely on points of failure? Doing so might lead to actionable steps forward. Best!

andreasnoack commented 8 years ago

We are all well aware of this issue. It's not new and everybody here would like it to work the way you describe but it's not super easy to fix. There are two main consideration

  1. Floating point numbers
  2. Empty arrays

Floating point number have special requirements which we need to handle so many algorithms use scaling and pivoting which assumes much more structure than necessary for the types you are interested in. However, we cannot sacrifice anything for floating point arithmetic as it will continue to be the most important use case.

Many of the promotion cases could try to use the instance instead of the type but then you'll need to come up with a solution for

  1. The empty case
  2. The case where the array has elements [1,1.0,BigFloat(1.0)].

Regarding 2., which of the elements do you use to determine the correct container type?

wbhart commented 8 years ago

@Sacha0 Ah, I see the confusion now. Thanks for clarifying.

I did not mean that Julia was inconsistent in its use of zero(x) vs = 0. I meant that although Julia provides zero(1), zero(1.2), zero(BigInt(12)), zero(BigFloat(1.2)), etc., the linear algebra and generic matrix code still fails for us with messages saying that zero(::Type{sometype}) is not defined.

When I try defining zero(::somenemotype) it still complains of missing zero(::Type{somenemotype}), indicating that the matrix code was calling the latter, not the former.

I brought it up because of the way promote_op was implemented (as linked earlier in this thread), where it does precisely this. It tries to invoke one(R) where R is a type, not a value. This would fail if we were to make any of our types belong to Number (admittedly we aren't doing that, due to the lack of multiple inheritence, but if that is ever implemented, I guess it will not work then).

I just checked this now with a matrix over an integers modulo n type from Nemo. The generic Julia det function complains about missing zero(::Type{Nemo.GenRes{Nemo.fmpz}}) even though we have supplied zero(a::Nemo.GenRes{Nemo.fmpz}}). So it is definitely trying to create zero of the type, not the value. Until now, some of the matrix functions worked for us, others didn't, because of this issue.

That's all I was referring to. Sorry this wasn't clear.

Sacha0 commented 8 years ago

@wbhart, thanks for clarifying! To confirm my understanding via the det example, for your types the generic det implementation in base/linalg/generic.jl

function det{T}(A::AbstractMatrix{T})
    if istriu(A) || istril(A)
        S = typeof((one(T)*zero(T) + zero(T))/one(T))
        return convert(S, det(UpperTriangular(A)))
    end
    return det(lufact(A))
end

chokes on the type computation typeof((one(T)*zero(T) + zero(T))/one(T)) given that T is a type rather than instance. Your application would instead require type computations of the form

a = A[1]
S = typeof((one(a)*zero(a) + zero(a))/one(a))

Do I understand correctly? Thanks and best!

wbhart commented 8 years ago

@andreasnoack I agree that empty matrices may have to be dealt with separately if you define zero on the value (instance) instead of the type. This isn't a major theoretical headache though.

In most cases, you have some for loops which iterate over the columns or rows of a matrix, and if the matrix is empty, no calls to zero are ever made. So for many algorithms, it isn't even necessary to deal explicitly with the empty matrix, as the algorithm does it automatically. I concede that this is not always the case and some work is required to support this in some case, i.e. extra branches at the beginning of a function to handle empty matrices.

I also understand that there are very many algorithmic complications for matrices over floating point numbers. We too have many different algorithms for matrices over specific rings. That's about an additional 30,000 lines of (ansi C) code for us.

The general Julia philosophy seems to be to do something generic which always works, then write specialised code for specific cases, like matrices of floating point numbers. Those won't affect us, since we won't call them (unless we are working with Arb types or something, possibly).

But obviously generic algorithms shouldn't make too many assumptions about the types they are implemented for.

I do accept the devil is in the detail and that Julia developers absolutely can't compromise the performance of numerical applications!

Regarding the [1,1.0,BigFloat(1.0)] example, I'm not aware of the history here. Could you elaborate or point me to a discussion about this issue?

To an algebraist like myself, I'm quite happy for this to be an array of BigFloat. Is that controversial?

I notice that promote_rule(Int, Float64) returns Union{}, but promote_rule(BigFloat, Float64) and promote_rule(BigFloat, Int) return BigFloat. Is that where the difficulty here is?

I was simply assuming that promote_op{R, S}(::Type{R}, ::Type{S}) = promote_rule(R, S) would work as a suitable default. But I see the problem if you really want to leave promote_rule(Int, Float64) undefined.

Ohhhh! You are worried about 1 + 1 = 2 but 1/1 = 1.0.

Yes, this is quite entertaining for an algebraist like me {pulls a beer and chips from the fridge and turns up the volume}. :-)

wbhart commented 8 years ago

@Sacha0 Yes, that looks correct. I don't think this is particular to our implementation though. There is essentially no other conceivable way (that we know of) to implement rings like Z/nZ that would work with the current Julia implementation of matrices.

We try not to ask Julia developers to support a particular vision or implementation. Our project (at least for 4.5 years) is a research project, to push Julia as hard as possible to see how much of computer algebra we can implement with Julia, as provided by the Julia community.

On this particular issue, however, there is no conceivable way to make it work. We have in fact rewritten our entire codebase multiple times since we began. We aren't afraid to do that. For that reason, we aren't looking for special hacks that only apply to us.

andreasnoack commented 8 years ago

The issue with empty arrays is not so much the actual algorithms. The problem is to determine the return type in that case and make sure that it's independent of the of the value of the instance. So if you have

x = YourModInt[]

how do you determine the type of f(x) for some f when you cannot compute an instance?

The issue with the different elements can be seen in the example below. If we have

x = Real[1,2.2,BigFloat(3.5)]

and you define a function that initializes an array by using the element type

f(x) = Array(typeof(x[1] + 1),length(x))

then you get that

julia> Base.return_types(f, (typeof(x),))
1-element Array{Any,1}:
 Any

because the new array can be Vector{T} for any T<:Real. So to follow up @Sacha0's example let's use this array to store the result. We'll define

function g(x)
    y = f(x)
    for i in eachindex(x)
        y[i] = x[i] + 1
    end
    y
end

which fails

julia> g(x)
ERROR: InexactError()
 in setindex!(::Array{Int64,1}, ::Float64, ::Int64) at ./array.jl:347
 in g(::#f1, ::Array{Real,1}) at ./REPL[117]:4
 in eval(::Module, ::Any) at ./boot.jl:234
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

To be fair, I should mention that the last part is also an issue for the way we handle this now. See https://github.com/JuliaLang/LinearAlgebra.jl/issues/287

wbhart commented 8 years ago

@Sacha0 I should mention that this example will choke anything other than floating point code anyway, since the slash operator / is floating point division. We can't use it in our code since Int divided by Int needs to be Int for us. We use divexact (the name we have given division) instead of / throughout our code and leave / for floating point division.

martinholters commented 8 years ago

I'd like to add a few remarks here which hopefully clarify some bits for some discussion participants:

wbhart commented 8 years ago

@martinholters Someone should update the documentation here [1] if that is true. It doesn't mention arrays at all and indicates that promote_op is for a completely different purpose. I was surprised and very confused to find it coming up in the context of arrays.

Secondly, why doesn't it make sense to have a fallback to promote_rule? It's only a fallback, right?

Thirdly, even the latest master still has promote_rule(Int, Float64) = Union{}. So how is promote_rule used to determine the result of 1 + 1.0?

[1] http://docs.julialang.org/en/latest/devdocs/promote-op/

martinholters commented 8 years ago
  1. Yes, the distinction is very unclear. From the example in the docs you cite, it is clear that promote_op can't be applied to the input arguments, but it says nothing why it is needed at all.
  2. The fallback got in the way very often, and it was deemed a good idea to be able to figure out when no appropriate promote_op is defined to apply other ways of determining the output type then. However, this is at present only implemented for broadcast, leading to this whole discussion. (Note that broadcast(+, A, A) would work for the example that started this thread.)
  3. Base.promote_rule(Float64,Int) gives Float64. To cite the docs: "Also note that one does not need to define both promote_rule(::Type{A}, ::Type{B}) and promote_rule(::Type{B}, ::Type{A}) — the symmetry is implied by the way promote_rule is used in the promotion process."
wbhart commented 8 years ago

@martinholters Thanks particularly for your answer 3. That is now clear and makes sense.

I'm still completely lost on points 1 and 2, but will just wait until the experts resolve the issues here (and hopefully fix the documentation, to better explain what they intend).

nalimilan commented 8 years ago

@wbhart Use promote_type(Float64, Int). promote_rule should only be used for adding method definitions.