fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
210 stars 56 forks source link

[Feature Request] Change covBy to be more useful #147

Closed nhirschey closed 3 years ago

nhirschey commented 3 years ago

Is your feature request related to a problem? Please describe. I implemented covBy and covPopulationBy in pull request https://github.com/fslaborg/FSharp.Stats/pull/114 in February 2021.

However, I think that I did a poor job and that they're not useful as written. I propose that they be written to allow the following.

I think the most useful scenario for these functions is something like:

// Imagine I want the covariance between A and B
let x =
 [ {| A = 1.1; B = 2.2|}
   {| A = 1.0; B = 2.2|}
   {| A = -1.1; B = 2.2|} ]

// It would be really nice and natural to write
x |> Seq.covBy(fun x -> x.A, x.B)

The way Seq.covBy is written now (let covBy f seq1 seq1 = ...) does not seem very useful. I'd like to change it to

let covBy f seq =
   let x, y = seq |> Seq.map f |> Seq.unzip
   cov x y
bvenn commented 3 years ago

In general, we prefer the By-function to extract just one unzipped feature out of the sequence, but in this particular case we would have to add another By-function for the second sequence to make it generic. In my opinion, this would add a unnecessary layer of complexity, so I agree that the version you presented has advantages over the current one. 👍

nhirschey commented 3 years ago

Sounds good @bvenn.

I really like your general By-function design scheme. I agree that this new version I'm proposing is "different" but I think it's probably the lowest-friction covariance analogy to the varBy and stDevBy functions that I love.

Thank you for taking the time to consider this suggestion. I'll work on the pull request.

bvenn commented 3 years ago

closed by 4a07244