Discussion: file upload workflows

itowlson commented 3 years ago

Hippo is a platform where users can deploy and serve applications written in WASM. The presumed framework for application authoring is WAGI; the tentative assumption is that applications are stored in a Bindle repository. An application may require assets over and above the WASM binaries, such as images or stylesheets. We need to define how Hippo thinks about the assets that make up an application, how it stores them, and how users upload and manage them in the user interface.

The current assumption is that at the database level a Build will simply refer to a Bindle URL; it should not have to care how the assets got into Bindle.

There are two key workflows to consider:

Production (including staging, gradual rollout and other 'controlled' phases of deployment) - when a set of source artifacts has been committed to version control, and has been blessed to go forth.
Development - when a developer is working on a fix or feature and wants to make sure it works correctly on the Hippo platform.

Production

The production workflow is fairly simple. The project can have a CI workflow (such as a GitHub action) which uploads the assets (parcels and invoice) to the Bindle server and then creates a Hippo application build that references the Bindle URI. We can scaffold the GitHub action using yo wasm.

The main issue here is knowing which files to include in the bindle. This is also an issue for the developer workflow and we'll come back to it.

Development

The development workflow is trickier. It needs to be:

Fast enough not to be interfere with the inner loop
Repeatable enough not to be error-prone
Simple enough not to interrupt flow

What we don't want:

Having to manually upload potentially dozens of files, and assemble them into a bindle, every time they do a test build. (This doesn't mean upload by hand shouldn't be an option. But it shouldn't be the only way.)
Having to push to GitHub on every tweak you want to test.

Perhaps the ideal UI for this is more like the old Visual Studio “Publish” command, which built your Web application and deployed it to a server. This was integrated into the development environment, and did not require a commit or any per-publish configuration. But Visual Studio knew what ASP.NET applications looked like; we don't know how to build user apps or how they're laid out.

QUESTION: Could we mandate the layout and build structures e.g. assets must be under X directory, modules must be under Y directory, building must be via a make build command? Some of this could be hostile to environments that have their own conventions and tools.

Suppose we have a bill of materials that tells us what assets are needed in an application build. Then we can consider three stages:

Build the modules (and any non-module assets such as Sass -> CSS) - this is language-specific.
Publish the assets as a bindle. This will do much the same as the GitHub action – we could drive it off many of the same tools.
Create a Hippo build that refers to the bindle – this is the same as in CI.

A VS Code Publish task could combine those three stages, and could be scaffolded by yo wasm just as the current Build Wasm task is. The BOM could also be used in the GitHub action rather than the user having to maintain that separately.

This does have the implication that every test build gets pushed to a Bindle server, and since Bindle servers are never allowed to delete anything, this could lead to unwanted cruft. It could also make the versioning process tedious for a developer performing a rapid iteration. But we don’t want Hippo to have to maintain a parallel object storage mechanism.

QUESTION: Should we lobby Bindle to support a "dev mode" in which bindles can be overwritten or deleted? If not, should Hippo provide temporary, disposable Bindle servers for dev builds?

What might a BOM-based workflow look like

Imagine, then, that we define a HIPPOFACTS file (short for Hippo artifacts, roughly similar to a Dockerfile, but by convention would also include interesting facts about hippos). This could contain glob patterns for all the assets constituting the application bindle:

# HIPPO FACT: The closest living relatives of hippos are whales and dolphins
[invoice]
package/invoice.toml
[parcels]
modules.toml
out/todo-app.wasm
images/*
styles/*

Or it could be a bastardised invoice.toml but with things like parcel sizes left off, and permitting globs in place of names (that would be expanded in the generated invoice), and with wildcard versioning (so a user could keep redeploying and we would just add incrementing patch number or +timestamp or something), etc.

In either case, the HIPPOFACTS file could be scaffolded by yo wasm, but would have to be maintained by the user.

Then the Hippo deployment workflow would be

Build from source
Run a bindle publishing tool that read HIPPOFACTS and uploaded the invoice and parcels
Create a Hippo build object pointing to the created bindle
~~Profit~~ Test

This could be encapsulated by a VS Code task for dev and a GitHub action for production; both could be scaffolded by yo wasm.

Authorisation considerations

This workflow assumes the user has permission to upload directly to the Bindle server. This may be considered undesirable; the PaaS operator may want to gate all uploads through Hippo to verify that the user has permission to upload builds for this project. This could be addressed by having the publish tool send a tarball to Hippo which it authorises, unpacks, and validates, and then creates the bindle itself.

We need to understand and capture the usage scenarios and requirements around this.

UI file management

What does this mean in terms of Hippo UI file management requirements? It means probably the only case we have left is the super simple “one or two files”. For this, we can synthesise a bindle with the uploaded modules as the only parcels, with bindle name and version inferred from the build. This would be of limited use for all but the simplest one-off configurations though – hello world, spiking out ideas, etc.

Prioritisation

How we prioritise this probably depends on our demo cases. We could get something ultra-simple going in the UI, but having only that could limit what we do in our demos (because you would not want to show more than a very few files). I feel we are better off prioritising a tool-based approach if we have resources, but it's likely to be more effort than a quick web upload form!

technosophos commented 3 years ago

My random notes from our meeting:

name = "myapp"
version = "0.2.1+ivan.1" # Maybe we don't need this, then
description = "A Node.js package manager using Bindle"
authors = [ "M Butcher <matt.butcher@microsoft.com>" ]

[modules]
- module-a.wasm
- module-b.wasm

[modules.mygroup]
- module-c.wasm

[files]
images/*
style/*
foo.js
foo.wasm

itowlson commented 3 years ago

It looks like we have backed away from the Bindle, potentially multi-module, strategy for now, and are focusing on a single-module PaaS/FaaS scenario. This gives clear direction for FaaS. It's not yet clear to me whether, as a PaaS, we need to handle assets other than the WAGI WASM module.

Do we need to handle a WAGI TOML manifest?
Do we need to handle PNGs, CSS, etc.?

We are also planning to host the WAGI instances in Nomad. Again, not all details have been worked out, but let us suppose for now that WAGI runs as a Nomad task via the fork-exec driver. This driver requires data to be already on the node, or to be in an artifact, Artifacts can come from various sources but HTTP/S is most relevant to us. Nomad automatically unpacks zip files and tarballs so we have ways to bundle things if we need them.

So, in the absence of Bindle, we need to store the WASM module (and any other data) where we can give Nomad a HTTPS URL to it (either plain or as a tarball). The URL could be to the Hippo server itself (e.g. a /artifacts/* route) or to a blob store such as Azure Storage.

I suggest that Hippo's internal storage be pluggable e.g. backed by its own local filesystem for dev purposes or backed by Azure Storage for production purposes. Artifact URLs are purely internal between Hippo and Nomad so they do NOT need to be consistent across different storage back ends. If we surfaced blob storage URLs directly as artifact URLs it would not be an issue as long as Nomad could GET them.

What is visible is the upload experience. We don't want users to have to upload in different ways according to the backing store, and in many cases we don't want users to have direct access to the backing store. Therefore Hippo does need to mediate uploads even if it doesn't mediate downloads. (The mediation role was previously played by Bindle, which could also plug into different back end stores.)

A possible workflow based on the earlier Bindle workflow:

POST your binary data to Hippo /artifacts [equivalent to previous bindle push]
- Shenanigans on the back end
Returns the URL at which the artifact was created [equivalent to previous Bindle URL]
- Alternatively we don't need to give out the full URL but only something that could be used to construct the URL, e.g. blob storage container name and blob name. This would mean the URL didn't need to be public, e.g. Hippo could use its credentials to create a SAS and give that to Nomad as the artifact URL.
Create a release with this URL/identifier [as before]

The single module approach does make interactive upload via the Hippo UI that much more tantalising. We should reconsider whether we now want to allow this, or still want to plan mainly along a tool that can be used in either dev mode or production mode.

bacongobbler commented 3 years ago

I like this approach a lot. The idea behind making this all invisible to the user means that we can swap out backends - or make it pluggable, as mentioned - and the end user experience is still the same. That does give us some flexibility should we decide to bring back bindle.

I would run this by @flynnduism just to check in and see if he has some ideas how we could visualize "the upload experience". With the decision to not pin ourselves to a git repository, we may have to wireframe a "Feed your hippo" UI (as well as a file upload API for CI workflows). I know he's done some prototyping in this area so it'd be great to gather input from him on this.

bacongobbler commented 3 years ago

It's not yet clear to me whether, as a PaaS, we need to handle assets other than the WAGI WASM module.

Do we need to handle a WAGI TOML manifest?

Do we need to handle PNGs, CSS, etc.?

Those are good questions. I would assume that we still want to handle PNGs and CSS files as we wanted to with the bindle model. I'm not so sure we want to handle a WAGI TOML manifest, as writing up a WAGI TOML file implies the user has intricate knowledge of how hippo pushes modules to WAGI. I'm not sure how we'd bubble that up to the user.

As I understand it, the module declares a set of import functions (_start, and optionally routes to declare your route table), so the "PaaS experience" is all there. We just need to wire up the module as a WSGI file and place all of the ancillary files in a wwwroot so WAGI serves them all from the same context. e.g. if the WSGI application maps /foo to func foo(), then we serve that function. Otherwise we serve content from the wwwroot or a 404.

Was there another model you were thinking of trying out here?

itowlson commented 3 years ago

❤️ FEED YOUR HIPPO ❤️

flynnduism commented 3 years ago

Here's the upload flow UI I had been working on. You can view it as a massive gif (sorry) or load the interactive prototype here to click around it as if it were real.

I've changed things around to focus on a single module + supporting files (css, html, etc)

itowlson commented 3 years ago

@flynnduism Because of the supporting files thing, we are still thinking that a typical workflow is going to involve an upload from the client (via CLI or VS Code action), rather than necessarily dragging files into a web UI. Can there be a way to feed your hippo a bindle ID or URL?

technosophos commented 3 years ago

Okay, so I think we settled today that Bindle support was definitely our route for supporting application storage/sharing, pending @itowlson 's agreement that he is okay with this. But we now have good reason to push forward with Bindle on all fronts, and I think it solves the problem for Hippo/Wagi quite nicely

itowlson commented 3 years ago

This is just as well because it's what Fisher and I concluded earlier and I've been busy implementing it.

flynnduism commented 3 years ago

@flynnduism Because of the supporting files thing, we are still thinking that a typical workflow is going to involve an upload from the client (via CLI or VS Code action), rather than necessarily dragging files into a web UI. Can there be a way to feed your hippo a bindle ID or URL?

Thanks @itowlson this is super helpful. Thinking about this, here's some ideas to replace the prior UI (file upload) with a reference to an upload that happened elsewhere.

Screen Shot 2021-05-13 at 2 56 23 PM

^ Pointing to a URL

Screen Shot 2021-05-13 at 2 56 33 PM

^ Pointing to a Bindle ID (if Hippo would know of the bindles that have been pushed from the client?)

Is this sort of thing on the right track of how application changes and releases would surface in the UI? Is it worthwhile viewing Bindle IDs like this in the UI, to define releases and publish things to channels?

I've updated the Figma prototype for the UI, removing the Upload dialogue and plugging in these screens into the New Release flow. https://www.figma.com/proto/SlB5QZ4lEDekkFu7EcE0R3/03-hippo-app-wireframes-03?node-id=0%3A2&scaling=min-zoom&page-id=0%3A1

itowlson commented 3 years ago

@flynnduism The HIPPOFACTS file is purely a client side thing. Hippo will not "build your artifacts" or consume the HIPPOFACTS file. The flow is more:

developer writes code -> dev or CI builds code -> dev or CI uses HIPPOFACTS to assemble built artifacts into a bindle -> Hippo runs the bindle

So what's relevant is the URL of the bindle, not the HIPPOFACTS file from which the bindle was built.

Hope this makes sense and sorry for the ambiguity about this!

flynnduism commented 3 years ago

Thanks for the clarification @itowlson! That helps a lot. I have updated the prototype to correct what's being represented...

itowlson commented 3 years ago

Yes, that looks good! My only comment is that bindles are identified by name/version rather than SHA, e.g. deislabs/weather/1.2.0, which fortunately makes the drop down much more readable.

flynnduism commented 3 years ago

Excellent!

itowlson commented 3 years ago

Oh, and drop the word 'build.' A bindle typically contains artifacts that are as built as they need to be; Hippo itself will not 'build' in the sense a developer is likely to interpret the word

itowlson commented 3 years ago

I think we now have a plan - the BOM side is implemented and @flynnduism is progressing the UI. So I'm closing the discussion issue. Feel free to reopen is anyone feels it's still needed.

deislabs / hippo