deconst / content-service

An API for storing, indexing and retrieving documentation
MIT License
4 stars 9 forks source link

Bulk content repository uploads #51

Closed smashwilson closed 8 years ago

smashwilson commented 8 years ago

Introduce a new API endpoint that accepts POSTs containing a tarball containing an entire content repository's worth of content (a "group") at once. The tarball may contain a metadata directory with repository-wide configuration files, including:

metadata/config.json  # optional; see below
metadata/keep.json # optional; content IDs within this group to keep even if absent

The initial format for a config.json file:

{
  "contentIDBase": "https://github.com/rackerlabs/whatever/"
}

If provided, all existing metadata envelopes that share the given content ID base but are included in the tarball will be deleted.

The keep file is intended to open the door for partial uploads - ultimately I'd like to introduce a way for preparers to query the content service API and only bother uploading those files that have actually changed since the last render. Any content ID that belongs to the group, but is not included in the tarball or the keep.json file, should be deleted by the upload.

If present, keep.json includes:

{
  "keep": [
    "https://github.com/rackerlabs/whatever/notpresent",
    "https://github.com/rackerlabs/whatever/notpresenteither",
  ]
}

All other files in the tarball should be metadata envelopes, with filenames that are the URL-encoded content IDs of each.

This is the first part of deconst/deconst-docs#133.

smashwilson commented 8 years ago

Hah, not ready to ship just yet:

~ docker:dev 
$ curl -H 'Content-Type: application/tar+gzip' -H 'Authorization: deconst apikey="..."' http://dockerdev:9000/bulkcontent --data-binary @howto-env.tar.gz 
{"message":"connect EHOSTUNREACH 204.232.156.220:443"}

I'm guessing that I need some kind of rate limiting on my Cloud Files uploads.

smashwilson commented 8 years ago

This is better.

I've reworked the content service to use MongoDB as the primary source of truth for envelopes. To transition, the Cloud Files container will be searched for envelopes that aren't present in Mongo yet, but all writes will go to Mongo exclusively. #93 details the cleanup work to take once our content has migrated over.

Now to document this in the README :memo: