Bevy needs a better asset-hosting solution for example assets

alice-i-cecile commented 3 weeks ago

What problem does this solve or what need does it fill?

As seen in #13671, adding or replacing new assets for Bevy's examples is shockingly controversial and high risk.

In the absence of other constraints, assets for examples should be:

representative of real world use
attractive, to showcase Bevy in a good light
high quality, to avoid confusing testers and users about Bevy problems vs broken assets
varied, to help

Without technical constraints, I would be very happy to have several hundred GB of test assets, updated on a whim whenever we have a new feature or find a better asset to replace something subpar.

However, because these assets are committed directly in Git, any assets (or changes to assets!) that we add are necessarily cloned with the repository. This wastes bandwidth and disk space for users, but also starts running into concerns with Github's hosting, which has a soft limit on repo size of 5 GB.

As an additional wrinkle, the way these assets are stored in the assets folder means that users commonly get confused when testing the examples locally, as the whole repo needs to be cloned to do this easily, rather than just copy-pasting code. See #13645.

What options have you considered?

Use Git LFS via Github

These are large files! We're using git! We should use Git LFS, right?

Eh, maybe. There are two broad problems here:

Git LFS has a reputation for instability and frustration. I haven't encountered these in my projects, but they're pervasive enough to worry me.
Git LFS hosting is quite expensive on Github. By default there's 1 free GB of storage and bandwidth which is hilariously, unusably low for Bevy.

To use this freely, we'd probably want 10 data packs, putting us at 500 GiB of storage and 500 GiB/month of bandwidth. At the current $5/month per data pack, that's $50/month. Not absurd, but not fun.

The amount of bandwidth required is very hard to estimate though: Bevy has a ton of users and contributors, and each of them will want to pull down the repo and examples.

If Github decides to offer us in-kind support for this, or some form of deal, I think this is worth considering. Otherwise, probably not.

Use Git LFS on a different host

With a bit more devops work, we should be able to configure Git LFS to use an alternate backend. Preliminary research suggests that this should be meaningfully cheaper.

It also opens us up to accepting in-kind donations from alternate hosting companies (hi get in touch?).

There's open questions here about the level of engineering required, and the contributor / user experience though.

No assets in the repo

Alternatively, we could avoid hosting assets in the repository at all, and instead simply download them on demand from Bevy-controlled servers. In the long term, this would likely share infrastructure with the Bevy Marketplace (or whatever we call our Unity Asset Store equivalent).

Appealingly, this means we have full control over cost / backing, and don't need to fuss with Git LFS at all. Users trying out example have things "just work" on copy-paste although there's weirdness with "please wait, downloading assets" and we don't waste nearly as much bandwidth copying over all of the assets for users that only want to try out a handful of examples.

This involves the most infrastructure work (but probably most stuff we want anyways), and we will need to take careful plan to ensure this doesn't cripple our ability to automatically test examples in CI.

Additional context

This issue was prompted by @superdump raising these concerns on Discord.

superdump commented 3 weeks ago

The main thing I think we need to move away from is having assets in the repo. I don't feel strongly about git lfs on another host, or something like just separate hosting and a script or tool to update/fetch differences and having assets/ in .gitignore or something.

mockersf commented 3 weeks ago

a small first step would be to move assets to another git repo, and make it available through a submodule

on that other repo, we could nuke history without breaking every PR when needed

we could also have an "example" data source for assets that would know how to fetch them and cache them locally

alice-i-cecile commented 3 weeks ago

The Rust Project has had nothing but extremely negative things to say about git submodules, so I'm a bit nervous. I don't mind the idea in principle though!

superdump commented 3 weeks ago

I've worked and do work with submodules quite a bit across a few projects. They are awkward, but they are a solution to a problem, as well as creating problems. :)

fintelia commented 3 weeks ago

Having an "http" asset source that knew how to fetch and cache assets from static hosting code be useful far beyond just the examples

mockersf commented 3 weeks ago

there's a risk it will be a more painful experience for anyone trying to contribute a new example: they will have to guess the new URL the asset will be available at, and the example won't work until the asset is available there

hxYuki commented 2 weeks ago

Allowing multiple sources for one single asset may be enough I guess? Let bevy fetch in order and synchronize among sources. Developers can easily work on local storage and easily distribute via network. Or an asset source which handles resource in this way should make it.

Olle-Lukowski commented 2 weeks ago

I think the last "No assets in repo" approach is best. Additionally, I think that the sooner we get started on something like an asset store, the better. I think having a central place to find assets, plugins, etc. would be really nice, especially for people coming from engines like unity. I also agree with @hxYuki, we should allow for different asset sources, and nice integration within bevy to fetch them (and potential nested dependencies) automatically.

alice-i-cecile commented 2 weeks ago

Allowing multiple sources for one single asset may be enough I guess? Let bevy fetch in order and synchronize among sources. Developers can easily work on local storage and easily distribute via network. Or an asset source which handles resource in this way should make it.

Right, I think that's a reasonable feature :) What this actually reminds me of is font fallback: maybe we can reuse the mechanisms.

I think that the best way forward is:

Implement a general-purpose asset fallback solution.
Implement a blessed way to fetch assets from the web.
Set up our own hosting for the existing assets with an endpoint.
Move all of the assets out of tree and swap to a "try locally, then download" strategy for all of the examples.

valaphee commented 2 weeks ago

In a previous attempt I modified https://github.com/bevyengine/bevy/blob/main/crates/bevy_asset/src/io/wasm.rs to use reqwest, the only downside is that reqwest is fairly large for non wasm builds

shanecelis commented 2 weeks ago

I like the simplicity of assets go in the "assets" folder. I agree that large assets ought not go directly in the repository. But I think it'd be a shame if all the examples referred to remote assets that are unalterable for the user and potentially leave them thinking they need to setup an http server before they can use assets.

Also which assets are we talking about remoting? Shaders and other text-based assets seem like a good thing to keep in the repository because they're small and you want that version history.

alice-i-cecile commented 2 weeks ago

Assets == "binary blobs" here.

My preferred remote solution is "search the assets folder, falling back to a download, which then populates the asset folder". Basically asset_server.load_asset("fox.glb").with_remote_fallback("https://bevyengine.org/marketplace/examples");

cart commented 2 weeks ago

We could do it at the "Asset Source" level to configure a global fallback for every request. Ex: override the default asset source to use a Fallback<FileAssetReader, HttpAssetReader> source.

cart commented 2 weeks ago

Or more likely a CachedFallback<FileAssetReader, FileAssetWriter, HttpAssetReader> source or something.

fintelia commented 2 weeks ago

Caching assets locally probably also means there needs to be a story around cache invalidation. If an asset is updated on the HTTP server, the local copies need to be replaced. Depending on the total size of assets, cache evictions might also have to be considered

bevyengine / bevy