fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

Addition of Geomspace #253

Closed DoganCK closed 1 year ago

DoganCK commented 1 year ago

Closes #252

Please list the changes introduced in this PR

add generic Seq.geomspace add Vector.geomspace add Array.geomspace add List.geomspace add tests

Description

Similar functionality to np.geomspace. Notes:

  1. I restricted the function to positive values. Numpy allows start and stop to be negative but I found this hacky for the following reasons: 1.1 Exponentiating negative values is a contentious matter. E.g. -2. ** (2./3.) = nan in F#. 1.2 What numpy does is it computes the geomspace by first changing the sign of start and stop and then changes it back to negative. I think this is misleading. If this behavior is what the user wants, I think they should change the sign explicitly, after using the function with positive values. 1.3 Numpy's version throws an unhandled exception when exactly one of start or stop is negative.
  2. I'd like add tests for failing cases as it's intended behavior. Can that be done with Expecto, @bvenn? 2.1 Relatedly, I also thought about using Result type returning an Error instead of throwing an exception; but I saw that the general convention was to throw an exception in other parts of the library so I sticked with that.

[Required] please make sure you checked that

[Optional]

bvenn commented 1 year ago

That looks pretty nice :rocket:

I understand your reasoning regarding negative inputs and I agree, a failing would be the right thing to do. The user is given full responsibility and should know when something is strange.

You can test for failing with:

testCase "geomspace_5" <| fun () ->
    let expected() = Seq.geomspace(-2.,2.,3) |> ignore
    Expect.throws expected "geomspace cannot be initialized with negative values"

You can add it to your branch, and afterwards I'm going to merge it.

The default num value is set to 50, as it is in numpy. I totally understand that you aimed to provide the same functionality, however I do not understand the reasoning why a default of 50 is reasonable in the first place. You have to know this default value to properly work with the function and not be surprised by the result if you don't set num. In linspace I changed it to an desired final increment of 1. Obviously the same is not possible for geomspace, so your approach is fine. If you have other thoughts about the use of num default in linspace, please let me know. I'm open for corrections of linspace

DoganCK commented 1 year ago

Thanks @bvenn!

I agree that num=50 seems very arbitrary. However, I think defaulting the stepsize to 1 changes the behavior of the function. That is, linspace(10, 20) returns an array with a different shape than linspace(10,25). These functions are commonly used in matrix computations and I think the shape should not change implicitly.

Also, there are already standard ways of generating lists with fixed stepsizes like: [10 .. 20].

However, we could make num to be a required argument rather than an optional one. I'm happy with that, I'm also happy with defaulting to 50, though arbitrary.

Small Note: I realized last night that we are giving up some performance benefits by generating everything from sequences. Array.init ... is +10x faster Seq.init ...|> Array.ofSeq. I will think about if there is a neat solution that allows maximal code sharing. I'll open another issue if I come up with something.

bvenn commented 1 year ago

(1) You are right, I assume num to be mandatory would be the most intuitive way to create such sequences. For quick plotting checks a default value is useful, but I think for learners and first-time users it does more harm than it helps in understanding the functionality. I'm going to update linspace and geospace accordingly.

(2) We had a discussion about the necessity for tailored implementation for all kinds of collection types. If performance is an issue for your application, we simply should go for separate implementations (and adding the same tests for every one of them). Feel free to open a new issue :+1:

codecov-commenter commented 1 year ago

Codecov Report

Merging #253 (ea478c0) into developer (e705070) will increase coverage by 0.04%. The diff coverage is 57.37%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@              Coverage Diff              @@
##           developer     #253      +/-   ##
=============================================
+ Coverage      45.01%   45.06%   +0.04%     
=============================================
  Files            148      148              
  Lines          14532    14555      +23     
  Branches        1930     1926       -4     
=============================================
+ Hits            6542     6559      +17     
- Misses          7460     7466       +6     
  Partials         530      530              
Impacted Files Coverage Δ
src/FSharp.Stats/List.fs 0.00% <0.00%> (ø)
src/FSharp.Stats/Seq.fs 0.78% <0.00%> (ø)
src/FSharp.Stats/Vector.fs 15.00% <0.00%> (-0.08%) :arrow_down:
tests/FSharp.Stats.Tests/Main.fs 0.00% <0.00%> (ø)
src/FSharp.Stats/Array.fs 2.53% <25.00%> (-1.30%) :arrow_down:
tests/FSharp.Stats.Tests/Array.fs 100.00% <100.00%> (ø)
tests/FSharp.Stats.Tests/Seq.fs 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.