[docs] Highlight `update!` API more to attract DL researchers

MilesCranmer commented 2 years ago

I think the update! API should be presented up-front in addition, or instead of, the Flux.train! API. This will help significantly with attracting deep learning researchers who I see as the bridge to wider adoption.

Motivation. I first encountered FluxML.jl maybe ~1.5 years ago. At the time, I skimmed the docs, saw this Flux.train! API on the current ~~README.md~~ quickstart page, and wrote off the entire package as being another one of those super high-level deep learning libraries - one where it's easy to write things in the high-level API but nearly impossible to tweak the internals. (Many others out there might do the same quick first impressions evaluation, even though a package maintainer's dream is that every user read all the docs.)

Today, I decided to take another look through the docs in more detail: I wanted to find something equivalent to what PyTorch and JAX deep learning frameworks have in that you can work directly on gradient updates and parameters. (This is important for many areas of deep learning research, as I am sure you know!)

I found the update! API (and withgradient) after a lot of digging through the docs. I am really happy with this API, as it gives me the low-level control over my deep learning models that I need for my research! So now I am actually planning to use FluxML for research.

Conclusion. It took me two passes at the docs, the second one very deep, before I actually found this API. Even after I found it, I only found the API reference for update!, rather than an easy-to-find example I could copy and start working with. This user experience is something that might lose potential users.

Proposal. Therefore, I propose that the update! API be demonstrated in the quick start example: both on the README, and up front in the documentation. I think this is really key to attract deep learning researchers as users, as the most popular deep learning packages by default expose this slightly lower-level API. It needs to be extremely obvious that one can do a similar thing with Flux.jl!

Here's an example I propose, which is similar to the style of PyTorch training loops (and so is a great way to convert some PyTorch users!):

using Flux
import Flux: withgradient, update!

# Chain of linear layers:
mlp = Chain(
    Dense(5 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 1),
)

# Set up the optimizer:
p = params(mlp)
opt = Adam(1e-3)
n_steps = 10_000

for i in 1:n_steps
    # Batch of example data:
    X = rand(5, 100) .* 10 .- 5
    y = cos.(X[[3], :] * 1.5) .- 0.2

    # Compute gradient of the following code
    # with respect to parameters:
    loss, grad = withgradient(p) do
        # Forward pass:
        y_pred = mlp(X)

        # Square error loss
        sum((y_pred .- y) .^ 2)
    end

    # Step:
    update!(opt, p, grad)

    # Logging:
    println(loss)
end

MilesCranmer commented 2 years ago

My current take on the README example is: "Here is a bunch of complex things we can do with very little code," but this is:

Intimidating, as it uses as the entire range of Julia syntax tricks (even as a Julia developer, it is hard to parse everything going on!).
Overall just hard to follow the logical flow.
I have no idea how to adapt it to any projects I work on, since it uses the high-level API.

I think the quickstart example should be:

Very straightforward in terms of use of the API and syntax. It should be helpful to new users, not a code golf submission.
Act as a template for the user to copy and start modifying for their problem.

I think for these reasons, it would be really nice if the example was simple and used the update! syntax. With an example like this, I think it is much easier to go about modifying it to a wide range of problems.

ToucheSir commented 2 years ago

The example was added in https://github.com/FluxML/Flux.jl/pull/2067. I'm personally in favour of removing train! from the docs wherever possible, but since this was added so recently I think a bit more discussion is required.

MilesCranmer commented 2 years ago

I see, thanks! I would also change this example: https://fluxml.ai/Flux.jl/stable/models/quickstart/ to include update! there, and perhaps also avoid the dataloader. Passing a dedicated dataloader to a dedicated train! function that does some internal stuff makes me think it’s a rigid package. It would be nice if the example demonstrates seamless integration with regular Julia code to show that no, FluxML is actually super flexible in terms of how you train the model and pass data. (A dataloader is something that I would look up once I need it, but for the quickstart, I think it might give the wrong impression.)

Edit: actually, maybe it's okay to use a dataloader in the quickstart example, so long as the looping is explicit.

MilesCranmer commented 2 years ago

(I think I confused the quickstart and readme pages from when I first checked out this package… I do remember seeing a train! example and getting scared off though)

darsnack commented 2 years ago

FWIW, most of the current maintainers do not like train! or the current difficulty finding information in the docs. We discussed both topics in the most recent ML call, and we drafted #2105 as template for overhauling the docs. I think the changes here should get reflected in that template. (I am too busy this week to polish up the template, but I will try this weekend).

ToucheSir commented 2 years ago

Worth adding here that we have this ML call every other week and it's open to anyone, so if you're interested in talking about docs work or anything else feel free to drop in :)

MilesCranmer commented 2 years ago

Awesome, thanks for sharing this update! I think that is an awesome initiative and would be well-appreciated by the community!

mcabbott commented 2 years ago

Welcome, and glad you persisted!

I made these examples recently. The goals I suppose were:

Have something you can copy & run off the readme, previously there was nothing. It must solve some vaguely neural net problem. It must be short, just a taste, not try to grow into yet another competing intro path. (I guess a few non-Flux lines are a bit golfed, but they do produce straighforward output when pasted into the repl, I think?)
Have a one-page quickstart which shows you major features in the docs, aimed at people who have seen some of this elsewhere. The main docs start very slowly introducing concepts one-by-one, which requires a lot of reading to just get to what the different things are called. And the model zoo tends to have a lot of auxillary stuff, reading args & loading data & so on, outside of core Flux.

Both can surely be better. Want to have a go tweaking the quickstart example to avoid train!?

I would vote to keep it with implicit params etc for now. Partly so that the to-be-written "how to upgrade from implicit to explicit" guide can clearly point to the before & after versions.

I would also vote for it to generate data outside the loop, as this is a bit more realistic. Demonstrating that DataLoader is something which takes & gives matrices also seemed like a good idea.

I think it's important that it not just push random numbers through, but solve some problem, however simple. (When I run the loop above, the loss doesn't decline, and there's nothing I can plot afterwards.)

MilesCranmer commented 1 year ago

Here's a tweaked README example. As a longtime PyTorch and JAX user, the following syntax feels very intuitive for me, I feel like I could understand it while being new to Julia. It's both not intimidating, and would make it easier for me to start tweaking various steps and modifying it to my own use case:

using Flux

# We wish to learn this function:
f(x) = cos(x[1] * 5) - 0.2 * x[2]

# Generate dataset:
n = 10000
X = rand(2, n)  # In Julia, the batch axis is last!
Y = [f(X[:, i]) for i=1:n]
Y = reshape(Y, 1, n)

# Move to GPU
X = gpu(X)
Y = gpu(Y)

# Create dataloader
loader = Flux.DataLoader((X, Y), batchsize=64, shuffle=true)

# Create a simple fully-connected network (multi-layer perceptron):
n_in = 2
n_out = 1
model = Chain(
    Dense(n_in, 32), relu,
    Dense(32, 32), relu,
    Dense(32, 32), relu,
    Dense(32, n_out)
)
model = gpu(model)

# Create our optimizer:
optim = Adam(1e-3)
p = Flux.params(model)

# Let's train for 10 epochs:
for i in 1:10
    losses = []
    for (x, y) in loader

        # Compute gradient of the following code
        # with respect to parameters:
        loss, grad = Flux.withgradient(p) do
            # Forward pass:
            y_pred = model(x)

            # Square error loss
            sum((y_pred .- y) .^ 2)
        end

        # Step with this gradient:
        Flux.update!(optim, p, grad)

        # Logging:
        push!(losses, loss)
    end
    println(sum(losses)/length(losses))
end

And we can visualize our predictions below:

using Plots

# Generate test dataset:
Xtest = rand(2, 100)
Ytest = mapslices(f, Xtest; dims=1)  # Alternative syntax to apply the function `f`

# View the predictions:
Ypredicted = model(Xtest)
scatter(Ytest[1, :], Ypredicted[1, :], xlabel="true", ylabel="predicted")

MilesCranmer commented 1 year ago

PR in https://github.com/FluxML/Flux.jl/pull/2108

mcabbott commented 1 year ago

I think this loop is exactly what we want in the quickstart:

for i in 1:10
    losses = []
    for (x, y) in loader

But I do not think the readme example should be as long as the quickstart one. We already have a problem with there being too many entry points, and I would like anyone reading a 30 lines to already be on a page of the docs (not the website tutorials, and not the readme).

More later.

MilesCranmer commented 1 year ago

I think generally it is good to keep the quickstart like a mini-tutorial while still being general enough so users can think about how to modify to their use-cases. So, in retrospect, I changed my mind and now agree with you that the dataloader is good to include!

I think many ML practitioners have very short attention spans - people will literally copy the quickstart example, try to hack it for their use-case using only trial-and-error, and never once read the docs, and quit if they can't figure it out. But once you "hook" them, and they can get something working for their use-case, then they will be much more likely to search around the docs pages to do something specific.

MilesCranmer commented 1 year ago

But I do not think the readme example should be as long as the quickstart one.

I think the first code example a user sees is the one they will assume to be the quickstart.

So perhaps if the goal is to move them to the docs pages quickly, then I would just remove the code example from the README altogether. (When I was trying Flux.jl yesterday, the README example acted as my quickstart tutorial - I didn't even look at the quickstart page at first).

FluxML / Flux.jl

[docs] Highlight `update!` API more to attract DL researchers #2104