deconst / deconst-docs

Documentation for the Deconst project itself.
https://deconst.horse/
6 stars 12 forks source link

Multi-phase preparer workflow #234

Closed smashwilson closed 8 years ago

smashwilson commented 8 years ago

I've been talking about doing this for a while, but I haven't actually documented the full idea anywhere yet. This is what I want to do with the way that preparers work:

  1. A preparer container (preparer-jekyll, preparer-sphinx) mounts the workspace into a volume. It's responsible for processing its input directory (CONTENT_ROOT or /usr/content-repo), writing each envelope to a url-encoded-content-id.json file in an ENVELOPE_DIR, and copying each asset to an ASSET_DIR.
  2. A submitter container mounts the volume. It's also provided with the CONTENT_STORE_URL and CONTENT_STORE_APIKEY. It submits all of the assets contained in ASSET_DIR to the content store, then submits all of the envelopes from ENVELOPE_DIR. For bonus points and mad performance, it should do this in two HTTP transactions.

This lets us:

Here's my rough checklist:

Follow-on issues:

kenperkins commented 8 years ago

It sounds like you're not planning on addressing (at least through this issue) any kind of differential asset upload. Is that correct?

smashwilson commented 8 years ago

Not initially, but this'll get us closer. Once we have bulk upload for envelopes and assets, we can add a handshaking request, where the submitter offers a set of checksums to see what it can leave out of the uploads.

smashwilson commented 8 years ago

I'm trying to keep this issue from becoming even more sprawling.

smashwilson commented 8 years ago

So, uh, now I am doing asset and envelope fingerprinting as part of this after all. Sprawl++

etoews commented 8 years ago

Is there a use case for eTags here?

smashwilson commented 8 years ago

Is there a use case for eTags here?

I don't think so, because our upload requests are performed in bulk. Part of my agenda is to accomplish a content repository publish with as few transactions as possible:

  1. The submitter has a full set of assets and envelopes on disk. It fingerprints them and queries the content store API to ask what's new all at once.
  2. The content store compares the fingerprints to the fingerprints of the latest resources. It returns asset URLs for assets that are already present and yes-or-no responses for envelopes.
  3. The submitter prepares a tarball containing all new assets and uploads it to /bulkassets. The response contains the new asset URLs for those assets.
  4. The submitter injects those URLs into the metadata envelopes.
  5. The submitter prepares a tarball containing all new envelopes and uploads it to /bulkcontent.

ETags are more of a request-by-request sort of thing, and even if you could attach more than one to a single request, they wouldn't prevent the submitter from needing to prepare and POST this giant tarball anyway. Unless I'm misremembering how they work, of course.

etoews commented 8 years ago

Still had my head in the individual content envelope model so I was thinking request-by-request. nm

smashwilson commented 8 years ago

I've split the doctest work into its own issue at deconst/strider-deconst-content#25, because I'm close to shipping bulk differential uploads without it. It'll still be a pretty natural extension.

smashwilson commented 8 years ago

Here's a full build of the how-to repository, even with buggy duplicate asset and envelope detection:

screen shot 2016-04-27 at 8 18 03 am

🐎 🐎 🐎

smashwilson commented 8 years ago

:metal: This is now live and working.