fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

[BUG] Matrix.mapiRows does not return a matrix #134

Closed kMutagene closed 3 years ago

kMutagene commented 3 years ago

Same is true for many of the functions starting from this line: https://github.com/fslaborg/FSharp.Stats/blob/5652e065a6c71579ac80a5278825b853b5c145c8/src/FSharp.Stats/Matrix.fs#L222

bvenn commented 3 years ago

Additionally, I would suggest to rename enumerateRows/Columns to mapRows/Columns and base it on Matrix.init instead of sequence expressions.

Edit: Matrix.init does not work since you don't know the length of the new rows.

bvenn commented 3 years ago

@kMutagene Actually, returning a matrix leads to problems if rows/cols of the matrix should be condensed to a single value. The following example would fail: Matrix.mapRows Seq.mean m

I think the mapping function should be as generic as possible (f: Rowvector -> 'a) and not be constrained to return a sequence. It is up to the user to convert the result back into a matrix. The only thing that should be changed is the naming of enumerateRows/Cols.

kMutagene commented 3 years ago

Agreed. Then we should at least supply a RowVector.map function to make it easy to stay inside a Matrix<float> afterwards. Currently, i have to do this for mapping rows with an index specific function:

myMatrix
|> Matrix.mapiRows(fun i row -> 

    row
    |> RowVector.toArray
    |> Array.map (fun v -> v * (float i))
    |> RowVector.ofArray
)

while i can do this for cols:

tmeaRes.ConstraintPotentials
|> Matrix.mapiCols(fun i col-> 
    col
    |> Vector.map (fun v -> v * - (float i))
)
kMutagene commented 3 years ago

I still think i had a valid point with my first concern though. Consider this scenario:

the code needed for that is:

myMatrix
|> Matrix.mapiCols(fun i col -> 
    if i > 2 then
        col
        |> Vector.map (fun v -> v * - 1.)
    else 
        pattern
)
|> matrix //this
|> Matrix.transpose //is error prone

because mapiCols implicitly transposes the matrix, i think this is unnecessarily complex to use for such an easy use case.

i would argue that your example Matrix.mapRows Seq.mean m

should actually be a fold operation, e.g. Matrix.foldRows Seq.mean

Examples from the core module support this. <Type>.map has always the signature of mapping: ('T -> 'U) -> input: Type<'T> -> Type<'U>. Our map functions change the outer type from Matrix to seq.

So a rowwise mapping should always have this signature:

(RowVector<'T> -> RowVector<'U>) -> Matrix<'T> -> -> Matrix<'U>,

and a columnwise mapping should have the signature

(Vector<'T> -> Vector<'U>) -> Matrix<'T> -> Matrix<'U>,

where 'T and 'U are floats in most if not all real usecases.

HLWeil commented 3 years ago

Hey, just a little heads up: You could substitute

|> matrix 
|> Matrix.transpose

with

|> Matrix.ofJaggedColSeq

With this you can get back to the matrix in a (IMO) natural and straightforward way, while maintaining the generic flexibility of the matrix map functions as they are.

Generally, I agree with your points though

bvenn commented 3 years ago

An additional comment:

When manipulating matrix rows that result in different row lengths, the Matrix.map operation is going to fail if the result is restricted to a matrix type. Here you have to transform it to a JaggedSeq first. Either by

|> Matrix.transpose
|> Matrix.toJaggedArray
|> Array.map f

or

|> Matrix.toJaggedColArray
|> Array.map f

with the latter missing at the moment.

bvenn commented 3 years ago

An appropriate Matrix function still should be discussed, that enables to condense a matrix row/column by (seq<'a> -> 'b) to a single value and results in a Vector (when rowwise) and RowVector (when colwise). Since this operation does not require a value passing, a fold structure like prposed before seems not desirable.

Suggestion: Matrix.condenseRows (fun rowvec -> f rowvec) m :Vector<'b>

with (f:rowvec<'a> -> 'b)

kMutagene commented 3 years ago

Upon further discussion and looking into what Deedle does, i think the changes needed are minimal:

  1. return columns/rows for the respective map function to keep the dimensions Matrix.mapRows: f(RowVector -> 'U) -> input:Matrix<'T> -> Vector<'U> Matrix.mapCols: f(Vector -> 'U) -> input:Matrix<'T> -> RowVector<'U>

you can do all kinds of operations using these functions, such as reducing, folding, etc. the rows/cols

  1. Provide safe functions to create matrices from above output Matrix.ofRows: rows:Vector<RowVector<float>> -> matrix Matrix.ofCols: cols:RowVector<Vector<float>> -> matrix

In a more long-term perspective we could implemenmt the generic Matrix as an interface, and the specialized implementations as implementation of that interface. That way we could also overload the map functions to provide a way to returning a Matrix instead of a Row/Column vector.

bvenn commented 3 years ago

I've made some changes. What do you think about it? 🤔

let m : Matrix<float> = 
    [
        rowvec [1. .. 3.]
        rowvec [4. .. 6.]
    ]
    |> Vector.Generic.ofSeq
    |> Matrix.ofRows 

let means :RowVector<float> = 
    m
    |> Matrix.mapCols Seq.mean

val means : RowVector<float> = rowvec [|2.5; 3.5; 4.5|]

bvenn commented 3 years ago

closed by fc39f53