FastAI.jl already has some functionality for discovering features as explained in this tutorial. This allows users to
find a dataset for a given task (e.g. find a dataset that can be loaded with (Image, Label) blocks)
find a supervised learning task for a given data modality (e.g. given (Image, Label) , find the ImageClassificationSingle function)
These are two examples of a more general idea: a list of features with rich information that can be queried. This list can be extended by sub- and external modules to add functionality and make it available through a common interface.
Here I propose the implementation of a more general Registry that is more flexible in what information can be stored and has a consistent interface for querying and extending.
There could be registries for the following groups of features (and more):
Dataset: defines how a remote dataset is made available. For example imagenette2 can be downloaded
Dataset recipe: a recipe is a way to load a dataset into a data container ready for a learning task. For example imagenette2 can be loaded into a (Image{2}(), Label(...)) data container.
Learning task: find high-level constructors based on blocks (i.e. what findlearningmethods does currently)
Encodings: find encodings that are used with a certain block, e.g. all Encodings that work on Images
Architectures: find (possibly pretrained) model architectures that can work with some block data, e.g. convolutional architectures can take in ImageTensors.
Having these registries would make it easier
for users to find functionality relevant to their use case
for third-party (e.g. domain) modules to extend base FastAI.jl
to generate no-code interfaces populated with registry data where you can select functionality from dropdowns
Prototype
As an idea for the API, I've created a prototype registry that stores dataset information for the fastai datasets.
The information is incomplete for now, but we can already do some querying based on the columns:
Then select an entry and load it, downloading if it is not available:
Complete columns can also be extracted and worked with:
Having something like this just for the datasets or models may also be relevant to MLDatasets.jl or Metalhead.jl @CarloLucibello @darsnack
FastAI.jl already has some functionality for discovering features as explained in this tutorial. This allows users to
(Image, Label)
blocks)(Image, Label)
, find theImageClassificationSingle
function)These are two examples of a more general idea: a list of features with rich information that can be queried. This list can be extended by sub- and external modules to add functionality and make it available through a common interface. Here I propose the implementation of a more general
Registry
that is more flexible in what information can be stored and has a consistent interface for querying and extending.There could be registries for the following groups of features (and more):
imagenette2
can be downloadedimagenette2
can be loaded into a(Image{2}(), Label(...))
data container.findlearningmethods
does currently)Encoding
s that work onImage
sImageTensor
s.Having these registries would make it easier
Prototype
As an idea for the API, I've created a prototype registry that stores dataset information for the fastai datasets.
The information is incomplete for now, but we can already do some querying based on the columns:
Then select an entry and load it, downloading if it is not available:
Complete columns can also be extracted and worked with:
Having something like this just for the datasets or models may also be relevant to MLDatasets.jl or Metalhead.jl @CarloLucibello @darsnack