This PR adds functionality for setting up encodings that need access to a data container, e.g. to compute some statistics over each observation.
The interface looks something like this:
"""
setup(Encoding, block, data; kwargs...)
Create an encoding using statistics derived from a data container `data`
with observations of block `block`. Used when some arguments of the encoding
are dependent on the dataset. `data` should be the training dataset. Additional
`kwargs` are passed through to the regular constructor of `Encoding`.
## Examples
``julia
(images, labels), blocks = loaddataset("imagenette2-160", (Image, Label))
setup(ImagePreprocessing, Image{2}(), images; buffered = false)
``
``julia
data, block = loaddataset("adult_sample", TableRow)
setup(TabularPreprocessing, block, data)
``
"""
function setup end
Beside the setup implementation, this addition will also change the interface of the learning method functions like ImageClassificationSingle as they now take a data container data as a second argument in case the setup step is needed.
Specifically, the computer vision methods can compute normalization stats on the image (which is turned off by default for performance reasons; there'll be a flag) and the tabular methods compute stats for TabularPreprocessing on a table dataset.
To-Dos:
implement, test, and document setup for
[x] ImagePreprocessing
[x] TabularPrepreprocessing
[x] update learning method functions to use setup
docs
[x] update quickstart docs to include tabular tasks (this was pending on this functionality)
This PR adds functionality for setting up encodings that need access to a data container, e.g. to compute some statistics over each observation.
The interface looks something like this:
Beside the
setup
implementation, this addition will also change the interface of the learning method functions likeImageClassificationSingle
as they now take a data containerdata
as a second argument in case the setup step is needed.Specifically, the computer vision methods can compute normalization stats on the image (which is turned off by default for performance reasons; there'll be a flag) and the tabular methods compute stats for
TabularPreprocessing
on a table dataset.To-Dos:
setup
forImagePreprocessing
TabularPrepreprocessing