apple / swift-numerics

Advanced mathematical types and functions for Swift
Apache License 2.0
1.68k stars 145 forks source link

Variadic GCD and LCM #276

Open LePips opened 10 months ago

LePips commented 10 months ago

Implements variadic GCD and LCM. LCM uses the GCD method with accounting for gcd == 0.

A freestanding variadic API implementation is natural when a developer has the values on hand and want to use those values without creating an array, which is what variadic parameters allow for this exact scenario. A single value is required to avoid the oddity of gcd() and lcm(), which do have some mathematical sense on an empty collection of values but don't make sense as an API.

For future freestanding numerical functions, it may be best to enforce a rule that the implementation must have at least one parameter, up to the minimum required, and allow a last variadic parameter if it is fitting. Of course that's up for discussion but I personally feel that is a good and sensical API.

For functional programming and common developer implementations, it feels natural to have gcd/lcm as an extension on Seqeuence where T: BinaryInteger. I admit this is highly a personal preference but this also matches how other min/max implementations are in the standard library as well as swift-algorithms. I think that what we decide here sets a precedent for future function implementations.

For said sequence function, either extension or not, we need to make a decision on how these functions work on empty sequences.

I can see both sides:

TODO:

stephentyrone commented 10 months ago

In practice, variadic functions for these are going to be less useful than operations on sequences:

@inlinable
public func gcd<T: BinaryInteger>(_ sequence: some Sequence<T>) -> T {
  sequence.reduce(0, gcd(_:_:))
}

@inlinable
public func lcm<T: BinaryInteger>(_ sequence: some Sequence<T>) -> T {
  sequence.reduce(1, lcm(_:_:))
}

Rationale: if you have the sequence version, you can just wrap your values in an array. But if all you have is the variadic version and you have a sequence, you're out of luck.

LePips commented 10 months ago

I did think about that, however I think that the correct place for the gcd/lcm of a Sequence is an extension to match other semantics of "what is the X value in this sequence".

stephentyrone commented 10 months ago

That doesn't argue for making them variadics, though. That would argue for making them an extension on Sequence.

I think I disagree: gcd(someValues) reads more naturally to me than someValues.gcd(), even if the latter is more typical for Swift. I'm open to discussing this point, however.

LePips commented 10 months ago

I see their value as variadics and extensions on sequence - Swift has the variadic freestanding for min/max as well as the min/max on Sequence. I can implement them here as well.

LePips commented 10 months ago

Do we actually want to mirror Swift's structure of the binary and variadic min/max for the freestanding implementations?

ex: have lcm(_:_) and lcm(_:_:_:_:)

stephentyrone commented 10 months ago

The variadic min/max probably don't carry their weight; I don't think that we would add them if we were writing the standard library today, but they date back to the very, very early days of Swift and they're grandfathered in. They shouldn't really serve as guidance for other API.

Can you give an example of code you want to write that naturally requires a variadic GCD or LCM instead of the sequence or binary form?

LePips commented 10 months ago

I personally do not have a situation where I need to compute the lcm/gcd often enough to make a strong argument for anything. A point for freestanding variadic could be that I already have my arguments, I don't need to put them into an array first.

An extension on Sequence is only natural to fit the functional paradigm. Frankly, I would see many people making that extension on their own if this package didn't provide it.

LePips commented 10 months ago

Apologies, after my caffeine high I have updated the PR description with the proper reasoning and justifications for this proposal which should have been there to begin with.

JimWallace commented 10 months ago

What's the reason for not including both variadic and Sequence versions?

Seems like both might be useful (though I'd agree that the free function that takes a Sequence is what I would have reached for).

stephentyrone commented 10 months ago

What's the reason for not including both variadic and Sequence versions?

It's generally desirable for a library to make choices about API, rather than just throwing in everything you think someone might want. You want to have a consistent and familiar way to perform operations when you can. Given the sequence form, it's trivial for users to get the variadic behavior by just making an array at the call site: gcd([a, b, c]), but there's no convenient way to go from the variadic form to an operation on sequences. So the sequence form makes the most sense.

Why provide the binary form, then, since that can also be spelled using the sequence form? Two reasons: first, the overwhelming majority of use is for exactly two values, so it's justifiable to carve out a specific binding for that. Second, you need to have the binary form to implement the sequence API anyway, and simply making it public doesn't add any implementation or maintenance burden.

JimWallace commented 10 months ago

That makes sense, but isn’t the Variadic form strictly more useful than the binary form? I’d take Variadic over binary as an API consumer.

This is probably a lot of handwringing over a tiny number of use cases, but if the API surface area is a concern Variadic and Sequence-oriented implementations could be public, and the binary form could be a fileprivate implementation detail.

markuswntr commented 9 months ago

In practice, variadic functions for these are going to be less useful than operations on sequences:

I do agree. A more elaborate algorithm will almost always have to resort to a function taking a sequence rather than a function taking variadic arguments.

I did think about that, however I think that the correct place for the gcd/lcm of a Sequence is an extension to match other semantics of "what is the X value in this sequence".

Making gcd/lcm instance functions via an extension on Sequence (e.g. values.gcd()) reads very unnatural to me. If they should not be part of the global namespace (which is probably a good thing), I would rather prefer type functions, which also matches what has been done with ElementaryFunctions (and others) on Reals (i.e. making the function static and calling them like so: .gcd(values)) – but this diverges from the freestanding binary gcd, so we probably do not want to do this either and just make both freestanding.

This is probably a lot of handwringing over a tiny number of use cases, but if the API surface area is a concern Variadic and Sequence-oriented implementations could be public, and the binary form could be a fileprivate implementation detail.

I can not think of a use case for a variadic function of gcd/lcm that passes less than 2 arguments. If we were to limit the argument count to >= 2, we could merge the binary function and the variadic function like so (or similar):

public func gcd<T: BinaryInteger>(_ a: T, _ b: T, _ n: T...) -> T {
  if n.isEmpty {
    return _gcd(a, b)
  }
  return n.reduce(_gcd(a, b), _gcd(_:_:))
}

This yields us both, a function that takes 2 or any number of arguments. It does not increase the API surface and it's not a source breaking change. Also, it would not be a performance regression for the binary function if the optimiser is able to replace it with the underlying binary function in cases where only 2 arguments are given – although I don't expect it is much of a performance hit otherwise.

Edit: I could see people being confused (at the call site), not knowing that they could only pass 2 values, but I do not have any example to justify that claim.

LePips commented 9 months ago

In practice, variadic functions for these are going to be less useful than operations on sequences:

I do also agree, however I reiterate that variadic parameters are targeted towards an API implementation allowance where a user may want to use n arguments without making an array/collection. It's nothing more than something nice-to-have as a user and I personally don't see a reason to exclude it just because there can also be a sequence version.

As an API design, I find a variadic form robust in usage and description. In fact, the cpp lcm documentation example uses the binary form to make a variadic form. Now obviously just because something is easy to implement and whether it belongs here has already been said and is an ongoing argument in any API design.

Making gcd/lcm instance functions via an extension on Sequence (e.g. values.gcd()) reads very unnatural to me.

I argue the opposite, but this is very strongly a design choice in functional programming. Swift allows very tacit function composition via type extensions and I am confident in saying that every Swift developer has used the style - either consciously or unknowingly. I could somewhat dive into this in detail, however that's blog post material.

If they should not be part of the global namespace (which is probably a good thing)

These functions are already namespaced by modules. I don't think that it's a good thing to differ from many other language's math/numerics packages that offer these types of functions as freestanding. I don't see namespace crowding as an issue here and it is consistent with other language implementations.

I can not think of a use case for a variadic function of gcd/lcm that passes less than 2 arguments.

I am personally fine with this for a developer usage reason and tool-autocomplete reason. If a developer is explicitly using it for one argument, I don't see a reason to disallow it - that's just odd. Additionally, when using autocomplete Xcode (at least) will also add a placeholder for the variadic argument - almost implying that it's required but it could be removed by manual deletion. In instances where auto-complete isn't available, then the developer is very much in a context where they know how many variables they would need to evaluate a function with.

I can also see how my argument is contradictory to avoiding the full-variadic argument. However I'm fine with that, it doesn't allow something even more weird and autocomplete tools already guide the developer towards >= 2 arguments at the call site.


At the same time, I'm not arguing for variadic-exclusivity with a sequence argument equivalent. I can actually see uses of all three styles depending on developer context - and there's very little overhead. Obviously we should be conscious that we are making decisions that would impact all future numerical function implementations and I have left some of my thoughts on the PR description.

Specifically for the sequence version, I would lean more towards these functions having named parameters so that the call site reads more easily. I argue that we see this more in standard library functions (not just to avoid conflicts, but also as an API detail) and that this wouldn't diverge from other language implementations (because I wasn't able to quickly find one that implemented an array/collection version).

So in short, I think we should offer all three and I don't think that it introduces much overhead at all to us or users.

lcm(a, b, c)
lcm(of: foo)
bar.lcm()