Closed davidbp closed 7 years ago
Mhm, I see your point. The main thing here is that a FoldsView
is a subtype of AbstractVector
, so here we don't actually highjack the printing, its done with Base code.
Is it really intended to be a vector? When I think of a vector I think about operations in a vector-space. It doesn't seem the case that this type will ever need to have any sort of algebra. I see it as a "placehodler" containing useful information.
At first I even thought that there was no need to have a Type to contain the folds. I though we could use an array (or array of pairs/triplets ...) of views. I think now that having a type can facilitate further abstractions so I'm OK with it, it's just that I see too much stuff that is not meaningful to me when printing. In the example above the following info retrieved
FoldsView(::Array{Float64,2}, ::Array{Array{Int64,1},1}, ::Array{UnitRange{Int64},1}, ObsDim.Last()) with element type Tuple{SubArray{Float64,2,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false},SubArray{Float64,2,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},true}}
which seems too much.
Is it really intended to be a vector?
well
I though we could use an array (or array of pairs/triplets ...) of views
it is exactly a lazy version of that.
it's just that I see too much stuff that is not meaningful to me when printing.
I fully agree there.
To be a little more concrete. I am in favour of highjacking show
to print more meaningful infos
MLDataPattern.FoldsView(data=X_iris, n_samples=150, n_folds=10, tr_sizes=(4,135), va_sizes=(4,15))
The main reason I don't like this specific version of it is because it looks like code with which one could construct the same object with.
Maybe some multiline summary
10-element FoldsView of 150 observations:
data: (4×150 Array{Float64,2}, 2-element Array{Float64,1})
training: 135 observations
validation: 25 observations
obsdim: ObsDim.Last()
keep in mind that the data need not be arrays
I never though about
The main reason I don't like this specific version of it is because it looks like code with which one could construct the same object with.
It's a good point. I like though to have the info of types in a single line when using them but it's a personal preference I guess. Having the info like in
10-element FoldsView of 150 observations: data: (4×150 Array{Float64,2}, 2-element Array{Float64,1}) training: 135 observations validation: 25 observations obsdim: ObsDim.Last()
It's definitely an improvement for the user.
Could you expand on what is
obsdim: ObsDim.Last() ?
ObsDim
is a dispatchable way to allow for different conventions as to what denotes an observation (eg. row vs column). We want to support both.
see http://mldatapatternjl.readthedocs.io/en/latest/documentation/container.html#observation-dimension
Right now a FoldsView object when declared prints a lot of information. In my opinion it would make sense more sense to retrieve only some relevant information
Example
Example with iris data does not seem very understandable
Maybe if it printed something like...
it would be easier to grasp that the type gives you