davidbp / MulticlassPerceptron.jl

MulticlassPerceptron.jl
GNU General Public License v3.0
3 stars 2 forks source link

Features as rows or features as columns. #5

Closed ablaom closed 3 years ago

ablaom commented 3 years ago

In MLJ a table is always assumed to be features-as-columns. If we are allowing the MLJ user to input a matrix (possibly sparse) as in this PR instead, then for consistency this ought to be features-as-columns as well, but the core methods expect features-as-rows.

To make the interface consistent, one could change this line to _reformat(X, ::Type{<:AbstractMatrix}) = X' (ie, add adjoint). If the MLJ user supplies his input X (from MLJ) as the adjoint of a features-as-rows matrix, then the two adjoint operations will compile to a no-operation, and there will be no loss of performance.

I'm kind of assuming here that the MulticlassPerceptron core method can handle any AbstractMatrix, including adjoints, which it probably should be capable of doing. Moreover, it can presumably detect when the user has passed data in an non-optimal format, and issue an @info recommending an alternative representation (if verbosity > 0).

davidbp commented 3 years ago

The link in This PR in the first paragraph is not working. Could you tell me where is the info about allowing an input sparse matrix?

I will test that the core handles adjoints as well. I still have to add tests for sparse matrices. I know that the code worked a year ago with sparse inputs but the sparse operations did not bring much of a speedup. Nevertheless I also recall the core sparse operations improved a lot since then.

davidbp commented 3 years ago

I just tested the code with adjoints and it works as expected.

I have updated all the examples in examples folder where now we can easily see different types of classification problems with different input types, all of them seem to work (in branch MLJModelInterface)

In particular:

Here I leave all the outputs of the examples:

julia --project=. examples/01_MPCore_iris.jl
Iris Dataset, MulticlassPerceptronCore

Types and shapes before calling fit!(perceptron, train_x, train_y)
typeof(perceptron) = MulticlassPerceptronCore{Float32}
typeof(X) = LinearAlgebra.Adjoint{Float64,Array{Float64,2}}
typeof(y) = Array{Int64,1}
size(X) = (4, 150)
size(y) = (150,)
n_features = 4
n_classes = 3

Start Learning
Learning took 0.859 seconds

Results:
Train accuracy:0.973
julia --project=. examples/02_MPCore_mnist.jl
MNIST Dataset, MulticlassPerceptronCore

Loading data
MNIST Dataset Loading...
MNIST Dataset Loaded, it took 0.459 seconds

Types and shapes before calling fit!(perceptron, train_x, train_y)
typeof(perceptron) = MulticlassPerceptronCore{Float32}
typeof(train_x) = Array{Float32,2}
typeof(train_y) = Array{Int64,1}
size(train_x) = (784, 60000)
size(train_y) = (60000,)
size(test_x) = (784, 10000)
size(test_y) = (10000,)
n_features = 784
n_classes = 10

Start Learning
Learning took 7.125 seconds

Results:
Train accuracy:0.936
Test accuracy:0.926
julia --project=. examples/03_MPClassifier_iris.jl
Iris Dataset, MulticlassPerceptronClassifier

Iris Dataset Example

Types and shapes before calling fit(perceptron, 1, train_x, train_y)
typeof(perceptron) = MulticlassPerceptronClassifier
typeof(X) = DataFrame
typeof(y) = CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}
size(X) = (150, 4)
size(y) = (150,)
n_features = 4
n_classes = 3

Start Learning
Epoch: 50    Accuracy: 0.9
typeof(fitresult) = Tuple{MulticlassPerceptronCore{Float32},MLJBase.CategoricalDecoder{String,UInt8}}

Learning took 2.298 seconds

Results:
Train accuracy:0.98
julia --project=. examples/04_MPClassifier_mnist.jl
MNIST Dataset, MulticlassPerceptronClassifier

Loading data
MNIST Dataset Loading...
MNIST Dataset Loaded, it took 0.487 seconds

Types and shapes before calling fit(perceptron, 1, train_x, train_y)
typeof(perceptron) = MulticlassPerceptronClassifier
typeof(train_x) = LinearAlgebra.Adjoint{Float32,Array{Float32,2}}
typeof(train_y) = CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}
size(train_x) = (60000, 784)
size(train_y) = (60000,)
size(test_x) = (10000, 784)
size(test_y) = (10000,)
n_features = 60000
n_classes = 10

Start Learning
Epoch: 50    Accuracy: 0.897
typeof(fitresult) = Tuple{MulticlassPerceptronCore{Float32},MLJBase.CategoricalDecoder{Int64,UInt32}}

Learning took 8.044 seconds

Results:
Train accuracy:0.936
Test accuracy:0.926
julia --project=. examples/05_MPmachine_iris.jl
Iris Dataset, Machine with a MulticlassPerceptronClassifier

Iris Dataset Example

Types and shapes before calling fit!(perceptron_machine)
typeof(perceptron_machine) = Machine{MulticlassPerceptronClassifier}
typeof(X) = DataFrame
typeof(y) = CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}
size(X) = (150, 4)
size(y) = (150,)
n_features = 4
n_classes = 3

Start Learning

[ Info: Training Machine{MulticlassPerceptronClassifier} @ 1…53.
Epoch: 50    Accuracy: 0.94Learning took 11.895 seconds

Results:
Train accuracy:0.98
julia --project=. examples/06_MPmachine_mnist.jl
MNIST Dataset, Machine with a MulticlassPerceptronClassifier

MNIST Dataset Loading...

MNIST Dataset Loaded, it took 0.545 seconds

Types and shapes before calling fit!(perceptron_machine)
typeof(perceptron_machine) = Machine{MulticlassPerceptronClassifier}
typeof(train_x) = LinearAlgebra.Adjoint{Float32,Array{Float32,2}}
typeof(train_y) = CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}
size(train_x) = (60000, 784)
size(train_y) = (60000,)
size(test_x) = (10000, 784)
size(test_y) = (10000,)
n_features = 784
n_classes = 10

Start Learning

[ Info: Training Machine{MulticlassPerceptronClassifier} @ 1…65.
Epoch: 50    Accuracy: 0.898
Learning took 10.5 seconds

Results:
Train accuracy:0.936
Test accuracy:0.926
ablaom commented 3 years ago

In particular:

05_MPmachine_iris.jl shows that a machine can be fitted with a standard julia array (even with an adjoint).

?? I don't see any array being used in the machine here - only a DataFrame (see your own output).

But I'm unclear what you think of my proposal? Should the MLJ user (as opposed to Core user) supply his machine with an p x n matrix (as now) or would you be happy to make the suggested changes that then require instead this to be an n x p matrix?

To be clearer, I expect the following works just fine at the moment:

using MLJ
X, y = @load_iris # X is a table with 4 columns, one per feature
A_wide= permutedims(MLJ.matrix(X)) # Matrix{Float64} with 4 rows
machine(perceptron, A_wide, y) |> fit!

However, I should prefer that this work instead:

A_tall = MLJ.matrix(X) # Matrix{Float64} with 4 columns
machine(perceptron, A_tall, y) |> fit!

and also this (with better performance, if we ignore the "one-time" cost of permutedims):

X, y = @load_iris # X is a table with 4 columns, one per feature
A_wide= permutedims(MLJ.matrix(X)) # Matrix{Float64} with 4 rows
machine(perceptron, A_wide', y) |> fit!    # <----- note the adjoint!
davidbp commented 3 years ago

?? I don't see any array being used in the machine here - only a DataFrame (see your own output).

Sorry I was referring to the 06 experiment.

But I'm unclear what you think of my proposal? Should the MLJ user (as opposed to Core user) supply his machine with an p x n matrix (as now) or would you be happy to make the suggested changes that then require instead this to be an n x p matrix?

I assume an MLJ user would follow the convention of an n x p matrix. The shapes of the arrays/Dataframes from examples 03 to 06 precisely follow this convention. Note that the examples 01 and 02 are fore the Core version (which is standalone and independent from MLJ). Isn't this what you would prefer?

ablaom commented 3 years ago

I assume an MLJ user would follow the convention of an n x p matrix.

Yes, but my point is that this is not what is currently implemented. The fix is #8.

ablaom commented 3 years ago

With #8 merged, I think we can close this.