Draft http rest points in cfs

raix commented 10 years ago

I just need an overview of what rest points we should / could have before writing the implementation - feedback and discussion is welcome, all invited

CC: @aldeed, @vsivsi

Prefix	URL Pattern	Method	Action
FileRecord			json format
/record	+ '/:collectionName'	GET	Published collection
		POST	insert record
/record	+ '/:collectionName/:id'	GET	Return filerecord
		PUT	update record
		DELETE	remove record+data
Raw Data
/files	+ '/:collectionName/:id'	GET	Return data for :id
/files	+ '/:collectionName/:id/:filename'	GET	Ditto
		PUT	whole file or chunks
		POST	needed?
		DELETE	remove record+data
/files	+ '/:collectionName/:id?store=mystore'	GET	Return data for :id
/files	+ '/:collectionName/:id/:filename?store=mystore'	GET	Ditto
		PUT	put data to store
		POST	needed?
		DELETE	remove data in store
Download
/download	+ '/:collectionName/:id'	GET	Return data for :id from store
/download	+ '/:collectionName/:id/:filename'	GET	Ditto
/download	+ '/:collectionName/:id?store=mystore'	GET	Return data for :id from store
/download	+ '/:collectionName/:id/:filename?store=mystore'	GET	Ditto

Note:

A file reference without a store defined will default to the first store defined in collection.

The current solution uses http-publish to create a http access/rest point for the fileRecord collection - This makes combined data/filerecord non-trivial.

aldeed commented 10 years ago

Sorry, @raix, wasn't checking e-mail, but I actually just released a first draft at the new HTTP access points and HTTP uploads as default. It's pretty buggy, so feel free to test and fix. I'm going to bed. :)

raix commented 10 years ago

yeah you should be sleeping :)

raix commented 10 years ago

I'll check it out - and implement x-auth header stuff in http-methods

aldeed commented 10 years ago

@raix, FYI, according to the PUT spec, if the URL identifies a specific existing resource, we're supposed to overwrite it and if not, we're supposed to insert it. So I don't think we need to handle POST requests at all.

I'm going to make a few more changes, bringing my draft development in line with what you have above, unless you already have some local changes to push up?

raix commented 10 years ago

Nope, had a bussy day Im working on an overview of all the packages, ill send it later to align with you - would be a nice reference for all. Im planning to implement the s3cloud stuff, making sure all the pieces will fit together. Agree on post. Just minor changes dont think itll conflict so rock on and let me know if you need anything,

aldeed commented 10 years ago

I pushed some changes. Now:

/cfs/record/collectionName returns an array of file objects (tests to prove it!) (no id support for /record yet)
The previous point is only true if you publish the collection on the server.
All of the /cfs/files endpoints and actions should work, but I didn't write the tests to prove that yet. Maybe tomorrow.
I kept download=true as a query string param rather than part of the URL. It's much easier to use the same endpoint with a boolean flag because switching to "download mode" is just a matter of setting one header. Also, conceptually it seems like you are accessing the same resource, just requesting a different type of delivery, so having the same URL makes sense.

Note that all of these endpoints are (or will be) created by the cfs-access-point package, except that the PUT endpoints are created by the cfs-upload-http package, which also has the client-side upload queue that sends data to those HTTP endpoints. I think this makes the most sense, but feel free to disagree.

raix commented 10 years ago

@aldeed I agree on letting the user do the publish of files collection - maybe we should add an argument to http.publish etc:

  var images = new FS.Collection( /* .. */);

  // Create CRUDL and publish all data for images
  HTTP.publish('/cfs/record/images', images, function() {
    // Thats a big publish...
    return images.files.find();
  });

or have the HTTP.publish prefix FS.Collections differently?

  var images = new FS.Collection( /* .. */);

  // Create CRUDL and publish all data for images
  // if HTTP.publish sees an instanceof FS.Collection then it'll prefix at
  // /cfs/record instead of /api ? - its a minor change in HTTP.publish
  HTTP.publish(images, function() {
    // Thats a big publish...
    return images.files.find();
  });

raix commented 10 years ago

btw. in the overview Im playing with - it seems that having client packages focus on the upload stuff and the access-point on the server (exposing both ddp and http up/download) is better. Consider having multiple clients one depends on http one on s3cloudand last on ddp - we should be able to allow the server to serve all these methods - so we cannot have the server access-point (put) depend on the client packages.

I think the current access-point code wraps both http and ddp (I'm going to add s3cloud soon) I'm not too concerned having it serve all methods. If the ddp and new http server put code differs alot we could seperate it into packages having the configurability (its basically just ddp and http since s3cloud will most likely be part of or extend the http package since some of the stuff really depends on the SA eg. s3cloud) Let me hear what you think

aldeed commented 10 years ago

@raix, the only reason I didn't let HTTP.publish do all the lifting directly on the FS.Collection .files is because publish doesn't currently have options that let you specify which restpoints you want added. I was thinking that I might add those options today or tomorrow. Then we can tell it to set up just the GETs and not PUT or DELETE for /record/.

I don't like the idea of http-publish doing anything special with FS.Collection by default. It's a nice trim package and it should remain unaware, but extensible. It's no problem to have our wrapper FS.HTTP.publish function that calls HTTP.publish on collection.files.

One issue is that the EJSON types come through as EJSON strings, which probably aren't quite what one expects to see. We could solve this by adding a custom format (already supported in http-publish), and then we'd have to set that as the default format for our restpoints (add a defaultFormat option, used when there is no format in the query string).

aldeed commented 10 years ago

Oh, and you might be right about moving PUT for HTTP and DDP back into access points. I do feel like we could separate into access-point-http and access-point-ddp, though. Then the developer can choose which to support without needing useDDP, etc. options.

raix commented 10 years ago

Ok, I think we could save some code and make HTTP.publish better by allowing the user to specify a mountpoint?

HTTP.publish('/cfs/record/images', images, function() {

Maybe have a wrapper for it FS.HTTP.publish? EDIT: thats what you are saying

Cool - we could have a common access-point for shared stuff? - and have access-point-ddp and access-point-http depend on it?

aldeed commented 10 years ago

OK, that sounds good. I can probably make those changes in a bit.

raix commented 10 years ago

Cool - working on http.method implementing basicAuth and x-auth header patterns

aldeed commented 10 years ago

@raix, are many other packages using http-publish? I'd like to refactor the publish API to (options, func) with options for name and collection that can be used separately or together. As you said, we should use options rather than optional args, and this is a prime example.

aldeed commented 10 years ago

Actually it should accept only options, since func is necessary only for GET. Like this:

HTTP.publish({
  name: '/', //required if `collection` not provided, default is `/api/collectionName` if `collection` provided
  collection: myColl, //optional, required for all access points other than collection GET
  publishFunction: myFunc, //optional, required for both GET restpoints
  types: { //all true by default
    collectionGet: true,
    collectionPost: true,
    documentGet: true,
    documentPut: true,
    documentDelete: true
  }
});

aldeed commented 10 years ago

Would be cool also to accept a string for publishFunction, where it's the name of another publish function already registered using Meteor.publish. Maybe the option should be called subscribe instead, to be clear that you're "subscribing" the restpoint to this publication set.

raix commented 10 years ago

Just a quick comment - haven't too much time.

types not needed since this is controlled by allow/deny rules - the Meteor way?
I would prefer the publish function as second argument instead of declaring publishFunction

This leaves mount point, collection or name to be set in options. I think generally speaking the options pattern is to prefer.

Not sure how many uses the http.publish why I thought about the argument pattern that would be backwards compatible - I'm using it in other projects.

raix commented 10 years ago

yep, guess we could tap into Meteor.publish functions by name - but it would be vulnerable to Meteor internal changes.

aldeed commented 10 years ago

The reason why types are needed is because we might want to mount something else at those points, and if the HTTP.publish method mounts them, then we can't because there will be a conflict.

I can change it to (options, func).

aldeed commented 10 years ago

Example, I want to let http-publish do the work of mounting /cfs/files/collectionName/:id for DELETE because calling remove is enough to delete the files. But we want to mount /cfs/files/collectionName/:id ourselves (using http-methods) for PUT and GET, because we need to handle chunked upload, store retrieval, etc.

raix commented 10 years ago

Have to check up on that - not sure this is currently supported in http.methods - since one mount point runs one function - why I did split the filerecord and chunks on to two prefixes. really gotta go now :) later

aldeed commented 10 years ago

Actually I ended up keeping our own custom method. If we do it through http-publish, then we have to call it for specific collections while our method does one general URL with collectionName as a param. So that's simpler.

aldeed commented 10 years ago

There are more tests now. GET and DELETE are working properly for all of the /files/ and /record/ URLs you list above.

vsivsi commented 10 years ago

I've come across a common use case where I think it is necessary to be able to serve files via HTTP by name/metadata Or more precisely, there needs to be a way to specify an HTTP GET that bottoms out in a CollectionFS lookup and file retrieval based on criteria other than ID. Basically building a metadata query into the request. I'm sure there will be many such cases, but this is the first I've come across myself: Serving map tiles to a client running something like Leaflet.

The requirement is to provide HTTP GET access for URLs like: http://tiles.meteor.com/base/place/{z}/{x}/{y}.png where {z}, {y} and {x} are integers determined programmatically from interaction with the map viewport.

Within CollectionFS {z}, {y} and {x} may be stored numerically in metadata, or may be composed into a filename for lookup.

One obvious way to do this would be to create a custom alternate HTTP access point for serving map tile data using such a scheme. Another would be for CollectionFS to provide a more general "metadata query" extension to its default id based HTTP access point to enable generic metadata queries to return a file when a query results in a unique match in a CollectionFS.

What do you guys think?

vsivsi commented 10 years ago

One addition to be clear. It would be totally fine if the GET URL had a form like: http://tiles.meteor.com/base/place/tile.png?query=true&z={z}&x={x}&y={y}

vsivsi commented 10 years ago

Or perhaps more in the format of the OP: /files + '/:collectionName/query?z={z}&x={x}&y={y}'

aldeed commented 10 years ago

Two thoughts:

Yes, would probably have to be in query string format to keep things simple.
Blindly translating unknown query string params into a find selector on the metadata object would be easy, but seems like there might be security concerns? Although we'd be checking the download allow/deny for each file, so maybe not.

raix commented 10 years ago

Http.publish?

raix commented 10 years ago

Guys i dont think we should clutter the access point adding complexity there. The query stuff has to do with the filerecord, if the user wants this it's easy to rig with http.publish. We should not rig query, it could become a Security issue if not checking and validating that query correctly - again it's simple to rig in http.publish.

vsivsi commented 10 years ago

@raix, I agree with your sentiment here, but can HTTP Publish return the actual file data? Maybe I need to take another look at it, but my recollection is that it would enable you to query FileFS records via HTTP, but returning the file store data for a query will either require a second HTTP GET to the CollectionFS access point (which would require changes/extensions to Leaflet to work), or it would require recreating the functionality of the file access point within the HTTP publish handler for the request (which seems undesirable).

It seems to me that CollectionFS should provide a complete enough interface that users can build what they need for most common cases directly on top of it. Which was the gist of my question above... Is this possible for the map tile case given the parts that now exist? Or is there some missing piece to enable an HTTP GET to create a FileFS record query that returns the file itself?

To use HTTP publish to accomplish this using a single HTTP request, it seems like a connection between HTTP-publish and cfs-access-point is necessary. Does that connection exist?

aldeed commented 10 years ago

I think you'd actually need to use http-methods, not http-publish, basically the same way the access-point package does it, but with your own logic for lookup. Or two separate requests, as you mentioned.

raix commented 10 years ago

So metadata is not directly related to the actual file data. The http.publish would allow you to publish the file list - depending on your publish function it could query the files. The file list could be the result of your query - the result contains the id's for the files you want to load.

HTTP.publish(function(data) {
  if (this.userId) {
    // myImages is FS.Collection - This will return the search result/file list
    return myImages.find({ 'metadata.name': this.query.name });
  }
});

raix commented 10 years ago

Not sure about why you want the actual data - if the result matches 100 files do you want all the data in one request? or just one - and what store should return the data?

aldeed commented 10 years ago

I think @vsivsi is saying he wants to download a single file, but do it by matching on three metadata properties rather than by matching on _id. It's true that this can be done by doing the record query first and then the files?download query on the returned _id, but he's wondering if we should provide a built-in way to avoid the double trip.

raix commented 10 years ago

Hmm, I'll think about it some more - generally I'm against queries due to security risks - but I'm thinking we could have the selector be replaceable by the user. The selector being the function that is handed this http.method scope and returns the file? Selector: https://github.com/CollectionFS/Meteor-cfs-access-point/blob/master/access-point-handlers.js#L31 The selector could be mounted on FS.Collection - and be overwritten by the user.

vsivsi commented 10 years ago

HI, no there's no need to trigger a "SaveAs..." download.

The issue is the file store may literally have 100,000 - 1 Million (or more) pre-rendered image tiles, a tiny subset of which need to be quickly retrieved automatically depending on what part of the map is being viewed (x and y) and the zoom level (z).

Take a look at this demo to see this in action (zoom and pan around... doesn't seem to work correctly with FireFox.) The three coordinates displayed in the corner of each "tile" are the calculated z,x,y coordinates for the corresponding image file that needs to be retrieved to show that part of the map.

So the "double round trip" will lead to unacceptable latency (this is interactive), and pre-loading the FileFS records for all files so the x,y,z --> id lookup can happen in-browser is also a non-starter because there can be a million or more files in the CollectionFS, of which the browser only needs less than a hundred at a time. Also, either of these solutions would require modification of Leaflet to work (which is highly undesirable).

So the way Leaflet and all other tile based mapping libraries that I've encountered work is they take a URL template like: http://tiles.meteor.com/base/place/tile.png?query=true&z={z}&x={x}&y={y} or http://tiles.meteor.com/base/place/{z}/{x}/{y}/tile.png or http://tiles.meteor.com/base/place/{z}_{x}_{y}.png or whatever.

Then it uses that template to construct however many HTTP GETs are needed to grab all of the tiles currently in the viewport (sometimes also preloading likely tiles to be needed next) and then handling the rendering.

So, with this context, how would you guys serve map tiles using CollectionFS (without modifying Leaflet)?

aldeed commented 10 years ago

Sorry, I didn't mean "save as" download, just that you want the data as opposed to the metadata.

Without core changes, you can do it by writing your own http.method for GET, copying the one in access-point pkg and modifying to do lookup by stuff other than id. You can obviously remove the DELETE handling and attachment/range handling, too.

vsivsi commented 10 years ago

@aldeed, Yes, that's what I originally was thinking in my original question:

One obvious way to do this would be to create a custom alternate HTTP access point for serving map tile data using such a scheme. Another would be for CollectionFS to provide a more general "metadata query" extension to its default id based HTTP access point to enable generic metadata queries to return a file when a query results in a unique match in a CollectionFS.

I think the only issue with this solution is it requires the CollectionFS developer/user to "look under the hood" of CollectionFS and essentially re-implement (by copying) a key piece of functionality (HTTP GET of file data). This is a completely fine solution if map tile retrieval is an isolated case: Just write a cfs-map-server package that does exactly what you suggest...

But I have the sense that there are many such use cases that differ in the specifics, but will need a similar type of access pattern based on meta-data queries. And so the broader question is: Should CollectionFS provide a uniform (and hardened) mechanism for these applications to get at the file data they need, or should they all need to roll a custom HTTP.Method that hard-codes their specific query and duplicates the cfs-access-point HTTP file-getter functionality?

aldeed commented 10 years ago

Right. I'm generally OK with support for metadata lookup in core, but maybe as an opt-in feature per FS.Collection. Which is what it sounds like @raix is thinking, too, as far as supporting a custom selector on the FS.Collection.

vsivsi commented 10 years ago

Agreed, opt-in via a custom selector makes a lot of sense.

raix commented 10 years ago

I Got a pattern that could actually solve more issues, I'll post later or write the code, @vsivsi cool usage btw.

raix commented 10 years ago

just commited a change to allow this kind of use - let me know what you think of it - could still be improved

raix commented 10 years ago

@vsivsi did you succeed using the more open rest point api / selector function? I'm closing this its settled - correct me if Im wrong

aldeed commented 10 years ago

We still need to add some more tests to make sure all the access points work properly, but that's really a separate issue.

Meteor-Community-Packages / Meteor-CollectionFS

Draft http rest points in cfs #180