FluxML / FastAI.jl

Repository of best practices for deep learning in Julia, inspired by fastai
https://fluxml.ai/FastAI.jl
MIT License
588 stars 51 forks source link

Setup step for encodings #171

Closed lorenzoh closed 3 years ago

lorenzoh commented 3 years ago

This PR adds functionality for setting up encodings that need access to a data container, e.g. to compute some statistics over each observation.

The interface looks something like this:

"""
    setup(Encoding, block, data; kwargs...)

Create an encoding using statistics derived from a data container `data`
with observations of block `block`. Used when some arguments of the encoding
are dependent on the dataset. `data` should be the training dataset. Additional
`kwargs` are passed through to the regular constructor of `Encoding`.

## Examples

``julia
(images, labels), blocks = loaddataset("imagenette2-160", (Image, Label))
setup(ImagePreprocessing, Image{2}(), images; buffered = false)
``

``julia
data, block = loaddataset("adult_sample", TableRow)
setup(TabularPreprocessing, block, data)
``
"""
function setup end

Beside the setup implementation, this addition will also change the interface of the learning method functions like ImageClassificationSingle as they now take a data container data as a second argument in case the setup step is needed.

Specifically, the computer vision methods can compute normalization stats on the image (which is turned off by default for performance reasons; there'll be a flag) and the tabular methods compute stats for TabularPreprocessing on a table dataset.

To-Dos:

lorenzoh commented 3 years ago

Added code for image statistics setup, on "imagenette2-160" takes around 2 secs using threads to go through the whole dataset.