Documentation plans - Githubissues

lorenzoh commented 2 years ago

With the new Pollen.jl frontend having been adopted in https://github.com/FluxML/FastAI.jl/pull/203, I am taking the opportunity to think about changes and additions to the documentation.

The term Reader refers to someone who is reading the documentation. I'll also be referencing the terms Tutorial, How-To, Reference, and Background so if you're not familar with this system for organizing documentation, please read https://diataxis.fr/.

Structural changes

Domain documentation

With #240 making domain-specific functionality subpackages, FastAI.jl has moved toward a one core, multiple domain extensions design. I think this is also beneficial for Readers who consult the docs for help with a problem in some domain they want to solve.

Each domain (e.g. computer vision, tabular, time series, text) will have its own page group in the docs menu, which should include the following pages:

Overview: gives a short background of the topic, links to related tutorials and also gives a short reference of learning tasks (e.g. TabularRegression), of the kinds of data it deals with (i.e. Blocks) and relevant data processing steps (i.e. Encodings) for those blocks.
Beginner Tutorial: for every domain, there should be at least one Tutorial that guides the Reader through a simple use case (e.g. single-label image classification). It should use the high-level interface (loaddataset, ready-made learning task, tasklearner) and link frequently to other pages with more detailed Reference and Background information.
Reference: An overview of the API of the domain (sub)module. Each exported symbol should have a comprehensive docstring that: gives a short description, explains required and optional arguments, and an Examples section that shortly covers some use cases in a How-To fashion.

Documentation for a domain module may also contain

more Tutorials: tutorials for intermediate and advanced use cases in the domain are a great way for Readers to engage with the library and possibly learn something about the domain as well
How-Tos: these should tell you how to perform common tasks, e.g. using augmentations in computer vision tasks.
Background: this can be used to explain topics related to the domain, design choices made when implementing the library and other topics that don't fit into the other categories.
Task pages: these can go into more detail about a specific learning task. Each should start with a 5-10 line end-to-end example, and then walk down the ladder of abstraction, showing the kinds of data and encodings being used.

General documentation

Next to the domain-specific docs, the domain-agnostic parts of FastAI.jl, like concepts, interfaces, training, data handling etc. should be documented. Good examples from domain submodules should be used in tutorials and how-tos to set explanations into context.

Additions

APIs overview

FastAI.jl has a lot of API layers, that build on top of each other and having a page that summarizes these in a neat diagram would be nice.

API tour

As a more interactive tour through the API and how pieces relate, I have long been thinking of something organized as follows: the tour starts with a high-level, 5-line example (as in the README), and gives some context for what is happening. Then, you can "drill down" into each of the lines and it'll give you the extended version using APIs one layer below. Consider the following high-level example:

data, blocks = loaddataset("imagenette2-320", (Image, Label))
task = ImageClassificationSingle(blocks)
learner = tasklearner(task, data)
fitonecycle!(learner)
plotpredictions(task, learner)

We could then drill down on each line, e.g. the first would take us to the following, expanded code:

path = datasetpath("imagenette2-160")
data = inputs, targets = Datasets.loadfolderdata(
    path,
    filterfn=isimagefile,
    loadfn=(loadfile, parentname))
classes = unique(eachobs(targets))
blocks = (Image{2}(), Label(classes))

We could again drill down on relevant lines, demystifying the API at every step, showing the Reader how they could use their custom components and linking to relevant material everywhere. For some more examples of "drilling down" from high-level one-liners, see this older post under the heading "API flexibility".

Extending

Every interface that is extensible should have documentation describing how to do so. Since most interfaces belong to the core FastAI.jl (i.e. not a domain library), this should be part of the general documentation.

Reference for how to implement the interface. This is best put under an "Extending" section in the abstract type's docstring, which should give an overview and link to necessary functions to implement. Each of these functions should have a more detailed "Extending" section.
Where possible, testing utilities like test_encoding that perform automated checks on the interface's invariants should be provided
Examples of extending an interface can also be featured in How-Tos or tutorials

Contributing

To make it easier to contribute and decrease maintainer burden, a contributing section should be part of the docs. It should clarify the following topics

Community standard and contribution process, e.g. ColPrac
coding style guidelines
how to implement interfaces
how the code is organized, especially that of domain submodules
how tests are written using InlineTest.jl and ReTest.jl
how to add documentation and run the docs interactively
PR template/checklist

Other content

(This is copied from https://github.com/FluxML/FluxML-Community-Call-Minutes/issues/35)

Tutorials
- ❗ FastAI.jl for fast.ai users: Multi-part tutorial series to help fast.ai users get started with FastAI.jl
- (Part 1) Julia Basics: Syntax basics, array programming
- (Part 2) Flux.jl vs. PyTorch: Differences between the frameworks, code comparisons for building a model
- (Part 3) FastAI.jl vs. fast.ai: Differences shown by comparing the code for a basic finetuning task. Pointers to more resources.
- Using parts of the API separately: Explains how FastAI.jl is built on many decoupled packages and that you don't have to use all of them. For example, showing how to use the LearningMethod machinery with a regular Flux.jl training loop and, inversely, using a Learner but with a custom data iterator and no learning method.
- Serving predictions on a web server: Reuses the trained model from the serialization tutorial and shows how to package it into a small HTTP server that can be used to get predictions.
- Implementing callbacks: Go from using callbacks to implementing your own callbacks, and explore how several existing callbacks are implemented. (Basic version here)
- Siamese image similarity: Showcase different parts of FastAI.jl's APIs to implement an image similarity learning task (original fast.ai tutorial), FastAI.jl#31
- Progressive resizing: Explain the method and implement it by building on the presizing tutorial. Train a vision model using it.
- Transfer learning: Explain transfer learning, backbones, pretrained models and the techniques used to successfully finetune them.
How-to
- Implement callbacks: Checklist for implementing callbacks.
- Evaluate models: Measuring performance on trained models
Reference
- FastAI.jl vs. fast.ai cheatsheet: Compare concepts and their equivalents in both libraries.
- Packages: Overview of packages that FastAI.jl depends on for different parts of its API: Flux.jl, DLPipelines.jl, DataAugmentation.jl, DataLoaders.jl, Metalhead.jl, ...

CarloLucibello commented 2 years ago

two very minor comments about the current state of documentation:

there is no search box (fixed in the revamp)
julia code highlighting is very faint

lorenzoh commented 2 years ago

Thanks for the comments! Re the code highlighting: I plan to mirror the syntax highlighting used in Documenter.jl 👍

lorenzoh commented 2 years ago

Updated with ideas from https://github.com/FluxML/FluxML-Community-Call-Minutes/issues/35

FluxML / FastAI.jl

Documentation plans #204

Structural changes

Domain documentation

General documentation

Additions

APIs overview

API tour

Extending

Contributing

Other content