Open wohali opened 6 years ago
Got to thinking more about this one in the shower. Here's a sample Mango-like document structure for a combined VDU/update handler that would capture all of the above:
{
"queries": [
{
"name": "sensor_name",
"selector": {
"type": "sensor",
"ip_address": "$doc.ip_address"
},
"fields": ["name"],
...
}
],
"fields": {
"whitelist": ["type", "datetime", "ip_address", "station_name", "temperature", "RH%", "rain_gauge", "wind.speed", "wind.direction", "light", "battery.voltage", "battery.current", "line.voltage", "line.current", "solar.voltage", "solar.current", "register_total"],
"blacklist": ["battery.wattage", "rain_gauge.*"],
"formats": {
"type": ["sensor_reading"],
"datetime": {
"$or": {
"$format": "$iso8601",
"$format": "$unixepoch",
"$regex": "(\\d{2})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2})"
}
},
"battery.voltage": "$float",
"battery.current": "$float",
"register_total": { "$regex": "^\\$\\d{1,9}\\.\\d{2}$" },
"wind.direction": {"$or": ["N", "NE", "E", "SE", "S", "SW", "W", "NW"]}
...
}
},
"patch": [
{ "op": "add", "path": "/", "value": { "name": "$query:sensor_name.name" } },
{ "op": "move", "from": "/ip-address", "path": "/ip_address" },
{ "op": "add", "path": "/", "value": { "server_datetime": "$server_time.iso8601" } },
...
]
}
Comments:
"type": "sensor"
that match the IP address of the incoming request. (Assumes that the $req
object has that value, not sure we pass that thru to VDUs right now...)fields.whitelist
and fields.blacklist
."type": "sensor_reading"
or any of the other format checks would fall through to the next whatever-we-call-this-thing. If all failed, the document would be rejected."$format": "$datetime"
that includes all the common datetime formats?Just some food for discussion.
IMO the patch
list better be named transform
and look like
transform:[
{
selector:{ // Mango selector for list of actions
type: { $neq: "whatever"}
},
actions: [ // List of actions
["stop"]
]
},
// Next item, used if we didn‘t stop processing
{selector:{ stamp: { $format:'Int', $gt: 1234567890123}}},
actions: [
["set", ".ip", ":peer" ],
["swap", ".a.0", ["a", "1"]],
["del", ".x.y.z" ]
]}
//...
]
Actions are pretty self-explanatory, so no need to use obj keys. Objects as .actions
members may be used later for syntax extensions or nesting.
This also replaces .fields.format
typecheck making it more flexible. Typechecks just move to actions list member .selector
prop.
Also syntax "battery.voltage": "$float"
in fields seems somewhat ambiguous. Using proposed syntax it should be "battery.voltage": {"$format": "Float"}
. $ marker before formats/types seems to be excessive, omitting it in favor of uppercasing type tag reads better.
It all seems unexpectedly reasonable. I’ll try to write down some of our _updates in this way, to compare readability and to understand other json-induced benefits/penalties.
And I couldn‘t comprehend what .queries
field supposed to do. Can you please give a hint?
@ermouth The patch stuff is coming from JSON Patch, which is an RFC and has widespread support. We're looking at it for partial update support, see #1498. If we build it for that, then we can reuse that support here rather than inventing our own crazy DSL. ;)
Sorry, I hit submit too quickly. The $
before operators is mimicing the Mango syntax, to distinguish operators from operands. You're right that in some places, where the value is expected to be a format (like the float check) we can probably drop the $
. As for the format, it's already inside the list for the top-level formats
key, so it wouldn't need an additional {"$format":"float"}
in my approach, since my approach is intentionally keeping the patch
section JSON Patch compatible.
As for queries
, the idea is you could run a Mango query before applying the VDU/update handler, then use that information later on. See the first line in the patch
entry for use of the lookup to convert an incoming sensor's IP address to a station name. The assumption is that the query will only return the first document, or a reduced set of values (see the reference to Mango aggregation functions being a potential prerequisite for this functionality). Optionally you could cram in there a .0.
to reference just the first doc in the result set, I guess. The returned result could also be used in the validator section, for instance, to look up the current list of allowed wind direction
values or something.
The pre-update-Mango-query stuff is speculative; there may well be pushback from the db core team on making update functions a little too clever and slow since they're now performing a read before a write. We'll see. I think the proposal is valuable even without that specific functionality.
I really like where this is going, great first draft!
Quick initial notes up top:
queries
in this example, this would open up having to handle cluster network error on the doc write path. We can revisit this at a later point, the current design here allows for an easy extension later on. I think it’ll simplify the already complex discussions required for a baseline feature (which I’m going to extend shortly).One further aspect that we should discuss the other use of VDUs. To recap, VDUs are used to enforce document schema (this is handled in this draft) and authorisation, where check doc contents against the context of ctx
.
http://docs.couchdb.org/en/stable/ddocs/ddocs.html#vdufun
For Example:
function (newDoc, oldDoc, userCtx) {
// in an authenticated request, userCtx.user is the authenticated username
// so we can:
if (oldDoc.author != userCtx.name) {
throw({ forbidden: 'you can’t update other user’s docs.' })
}
}
In addition, there is a fourth parameter, the secObj
, which is the database’s _security
object
function (newDoc, oldDoc, userCtx, secObj) {
if (secObj.members.names.indexOf(userCtx.name) === -1) {
throw({ forbidden: 'you can’t update if you’re not a member.' }) // contrived example, you wouldn’t get to this part in the first place, because the _security check comes first, but you should get what is meant
}
}
We could argue that this could co-incide with the security overhaul, which would take care of any authentication, and I’m happy to discuss this separately if needed, I just wanted to bring this up.
Maybe the draft can be amended, so fields
becomes schema
, and we introduce authorisation
as a new top level field:
{
"fields": {
"whitelist": ["type", "datetime", "ip_address", "station_name", "temperature", "RH%", "rain_gauge", "wind.speed", "wind.direction", "light", "battery.voltage", "battery.current", "line.voltage", "line.current", "solar.voltage", "solar.current", "register_total"],
"blacklist": ["battery.wattage", "rain_gauge.*"],
"formats": {
"type": ["sensor_reading"],
"datetime": {
"$or": {
"$format": "$iso8601",
"$format": "$unixepoch",
"$regex": "(\\d{2})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2})"
}
},
"battery.voltage": "$float",
"battery.current": "$float",
"register_total": { "$regex": "^\\$\\d{1,9}\\.\\d{2}$" },
"wind.direction": {"$or": ["N", "NE", "E", "SE", "S", "SW", "W", "NW"]}
...
}
},
"authorisation": [
{
"author": { "$eq" : "$userCtx.name" },
"throw": "you can’t update other user’s docs."
]}
}
where the line "throw": "you can’t update other user’s docs."
is equivalent to the throw({ forbidden: 'you can’t update other user’s docs.' })
-line from the first example above. $secObj
would work similarly.
I like this idea, and especially the ability to use multiple docs of this format to enforce rules such as "only users with role sensor
are able to add/modify documents of type sensor
. It's not fully-fledged per-doc auth since this only is traversed on the write path, but it might be a way forward for the read-path too (if people are willing to accept the massive performance penalty it'll invoke).
The queries
idea was blue sky, happy to break it into a separate issue so it doesn't get lost when we implement this one.
I'm flexible on what lives where in the actual doc, would like to see some other opinions as well.
I would say that for authorisation
, since you didn't break things out into separate selector
and action
sections, we use $throw
to ensure we don't overlap any tests against a real document key named throw
. We'll need to work out escaping just in case someone DOES have a key named $throw
, same for the other sections as well.
The alternative is to prefix all document fields with $doc
but I think this makes things needlessly verbose, don't you?
yeah not a fan of $doc
prefix, but $throw
works fine.
That might dovetail into my next proposal (that I’ll open a new ticket for), to add string manipulation funs to Mango, those then would follow the same lead, say $strlen(field.sub)
, but let’s keep that definitely separate from this proposal ;D
@janl should your JSON snippet be (i..e, with the schema
and authorization
top level fields):
{
"schema": {
"whitelist": [
"type",
"datetime",
"others....."
],
...
},
"authorization": [
{
"author": { "$eq": "$userCtx.name" },
"throw": "you can’t update other user’s docs."
}
]
}
Also, I used American spelling for "authorization".
Suggestion:
Questions:
formats
part of schema
)? Be lenient (ignore bits not understood) or strict (fail validation when can't process the whole set of instructions)?throw
clause which would allow the user to override default error to prevent this leakage.@mikerhodes good catch, that's a typo!
@mikerhodes asks:
- What would happen when replicating to older db versions that don't understand some part of the validation (e.g., didn't understand formats part of schema)? Be lenient (ignore bits not understood) or strict (fail validation when can't process the whole set of instructions)?
Idea 1: Maybe we need a version field, regardless of what we do with this, if not specified we assume it's the version of whatever the current runtime is
Idea 2: Maybe we need a "fail": "closed|open"
setting per-VDU
Idea 3: Maybe we need a server setting in local.ini
to toggle the same thing
The latter 2 come from electronic door locks that generally come with a user-configurable setting for this.
- There's an information leak possible: write-only users can discern document shape by observing error messages for schema. Perhaps an optional throw clause which would allow the user to override default error to prevent this leakage.
Well, in Apache CouchDB, there are no such things as write-only users to my knowledge. Are you talking about a Cloudant-specific thing, or something that might come in with #1504 ?
I was imagining that if any of the tests failed (pre-@janl's change) a generic response would be provided, along the lines of document failed validation
, maybe including the name of the whatever-we-call-this to point a finger, but not explictly including the precise reason for failure. I like the idea of a $throw.default
to override that message if you want, but I'm not sure it does much?
The other option would be to provide a "debug": true/false
boolean toggle that would include precise info about the reason for a document's failure. We've done a poor job of helping users debug their JS functions in the past. Putting the control over debugging a Mango-VDU in the hands of the developer would be a big step up in my opinion.
Thanks for the discussion!
Cloudant has write-only, but I did think that it'd likely come in with any access rewrite as it's definitely a useful thing.
I‘ve tried to re-write several real simple _update fns with proposed syntax. My observations:
making update functions a little too clever and slow
AFAIK more risk is to make them dumb and cryptic as old-style rewrites )
Here is a set of existing tools of the sort, hope it can be a source for further inspiration https://stackoverflow.com/questions/1618038/xslt-equivalent-for-json/49011455#49011455
schema, whatever implementation is, better have conditionals/guards
@ermouth do you have an example?
@janl latest json schema has if/then/else
http://json-schema.org/draft-07/json-schema-release-notes.html
I'll suggest the obvious: that the guards be declarative and mango selectors.
I think "condition" may be the right term for the authorization's actual business logic (might be a better term from the authorization literature, but it escapes me right now).
Combining the suggestion to split out the authz section into three fields:
{
"authorization": [
{
"guard": {"type": { "$eq": "post" }},
"condition": {"author": { "$eq": "$userCtx.name" }},
"throw": "you can’t update other user’s docs."
}
]
}
@mikerhodes I already suggested exactly same approach (using Mango selectors as guards/conditions), see above, however it was assumed too clever. Nice to see this obvious and practical way came not only in my head.
What is the difference between guard
and condition
in you example? And why special authorization
field needed; I mean why special field if a list of guard/actionList pairs may be used without pinpointing each pair semantic meaning?
@ermouth I think I misunderstood what you were proposing...probably a language barrier. It's clearer to me now.
I think the guard is checked first, to make sure that the document would even have the field in question, and then the condition is checked. This would translate to JS in such a way:
if (doc.type === "post") {
if (doc.author != userCtx.name) {
throw("you can't update other users' docs.")
}
}
Am I right @mikerhodes ? And if so, perhaps your condition
has the wrong logic sense? Not sure...
@wohali Yes, that's the basic logic. I think my condition logic is the wrong way around, because I would say that the throw
happens if condition
is false rather than if condition is true.
if (doc.type === "post") { // i.e., "if <guard condition>"
if (! (doc.author == userCtx.name)) { // i.e., "if not <condition>"
throw("you can't update other users' docs.")
}
}
@ermouth I can see that you could combine the guard and the condition into a single clause quite easily given they are both Mango selectors. I just find that the logical distinction of "the types of thing this authorisation decision applies to" (guard) and "the authorisation condition itself" (condition) makes things more clear.
As to whether schema
and authorization
are worthy of separate top-level concepts when I could see that you're right again that they could be combined in a kind of guard-condition-action type framework, I'm unsure. Again, it perhaps helps to separate them from a readability point of view, "here's my schema check, here's my authorisation check". Unsure.
I wrote up a gist describing a possible /{db}/_update
endpoint for facilitating Mango-based VDUs, based on the discussion in this thread. I based the transform
syntax on this secret update syntax in the Mango source. The transform
syntax could also be dropped from the proposal entirely, as it adds a great deal of extra work over simply validating an update.
What do y'all think?
After digging into Mango's source and discussing this feature with folks, I think only one new method+endpoint is necessary to facilitate Mango VDUs: POST /{db}/_update
, which acts much like POST /{db}
but it applies the new doc to the first matching VDU. The ?use_index=...
query parameter can be used to force the use of a particular VDU, much like in POST /{db}/_find
.
VDUs in this case are Mango indexes with the type
field set to vdu
. Their form differs from a "type": "json"
index in the following ways:
partial_filter_selector
since applying something at "index time" doesn't make sense for this context.index.fields
, since there is no indexing involved in a VDU.index.guard
contains a selector which checks if the VDU applies to a given document. Documents which do not pass this selector are not affected by this VDU.index.condition
contains a selector which validates the document against the VDU. The VDU will reject documents which pass its guard but not its condition.index.action
or index.transform
field could be used to apply them to documents which pass both the guard and the condition.I suggest this approach because mango_idx.erl
already contains facilities for supporting new types of indexes, making it much easier to develop a new type of index than to reimplement all the /{db}/_index
handlers for a new endpoint.
VDU selectors in this formulation can use string substitution to access $userCtx
and $secObj
. Accessing $oldDoc
requires a fabric call, which extends the window for conflicts to appear, so I would be inclined to permit only index.condition
to access it so as to minimize which updates can create lengthy conflict windows.
For example:
{
"index": {
"guard": { "type": "article" },
"condition": {
"name": { "$in": "$secObj.members.names" },
"$or": {
"$not": { "$exists": "$oldDoc.author" },
"author": { "$eq": "$oldDoc.author" }
}
}
},
"type": "vdu"
}
Does that make sense? Seem reasonable?
Good to see some movement here, and nice write up.
It feels a little odd to overload the term index
in this way -- in particular because, as you say, "there is no indexing involved" 😄 . I'm not sure this UX is going to work out, it feels like putting a VDU as a sibling to a map
function. In many ways, this is unrelated to Mango querying so I'd say that, like /_update
is new, a new /_vdus
or whatever endpoint that's a parallel to /_index
is more appropriate.
Another reason to avoid overloading the index term is that the VDU index section has a very different schema to a normal Mango index section too, which makes writing client libraries pretty tough (especially those libraries that map JSON to concrete types, which is a primary utility of a client library over raw HTTP calls IMO).
That makes sense, that overloading /_index
with a process that doesn't involve indexing does indeed not make sense. Thanks for that reality check :)
I still like the guard-condition syntax for the VDU, where the guard and condition are selectors with different expectations and abilities. I'll go for another few dives into the source and see if I can't figure out a PR.
Diana, this is really cool. If you need some help understanding the code, let me know. I’m happy to help.
From: Diana Thayer notifications@github.com Sent: Wednesday, January 23, 2019 6:11 PM To: apache/couchdb Cc: Subscribed Subject: Re: [apache/couchdb] Additional Mango-based update handler / VDU functionality (#1554)
That makes sense, that overloading /_index with a process that doesn't involve indexing does indeed not make sense. Thanks for that reality check :)
I still like the guard-condition syntax for the VDU, where the guard and condition are selectors with different expectations and abilities. I'll go for another few dives into the source and see if I can't figure out a PR.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/apache/couchdb/issues/1554#issuecomment-456862512, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAK9AsNusxhLYYBUvmFubc5WpW1MulBUks5vGIm_gaJpZM4V5aai.
@garrensmith That's very much appreciated. I'd love for some pointers on which files will contain pertinent code, functions I might have to edit or copy, etc. Otherwise I'm just stumbling through the source and hitting my local couch with requests, hoping the responses change in response to my actions.
I'm going to try and leave some ideas on how to implement this. The one thing I've noticed is that this is quite a large amount of work, so I think try and break it up into smaller usable PR's. I also think it is worth creating a RFC for this with all the information on this and add in the gist so that its in 1 central place.
I would start on the transform section first. That section could possibly be used in the mango query section. We could take a document we want to index and then transform it before adding it to the index. This has been a requested feature.
For $transform
, $schema
and $authorization
section
All of the mango syntax manipulation lives here https://github.com/apache/couchdb/blob/master/src/mango/src/mango_selector.erl
I think it might be best to create a new mango_vdu_selector or something file that uses parts of mango_selector and does all the validation and transformation of a document based on the syntax.
Saving/creating/getching of VDU's:
The CRUD for a mango index is here https://github.com/apache/couchdb/blob/master/src/mango/src/mango_idx_view.erl
Again I would probably create another file e.g mango_vdu and leverage mango_idx_view
to do all the retrieving of the vdu's.
I hope that helps a little.
I swear I thought I had mentioned this already, but apparently not on this thread.
I’d leave the transform bits out of this proposal and focus on the VDU part: validate that a doc looks like what you want it to be and comes from someone you want it to be.
I’d leave mango-like syntax for doc transformation to #1559.
+1 to reducing the scope of this work.
From: Jan Lehnardt notifications@github.com Sent: Wednesday, February 20, 2019 12:57 PM To: apache/couchdb Cc: garren smith; Mention Subject: Re: [apache/couchdb] Additional Mango-based update handler / VDU functionality (#1554)
I swear I thought I had mentioned this already, but apparently not on this thread.
I’d leave the transform bits out of this proposals on focus on the VDU part: validate that a doc looks like what you want it to be and comes from someone you want it to be.
I’d leave mango-like syntax for doc transformation to #1559https://github.com/apache/couchdb/issues/1559.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/apache/couchdb/issues/1554#issuecomment-465526805, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAK9Alq3mMGDv4fIXGZ8FmeiK12tIDM3ks5vPSo3gaJpZM4V5aai.
This ticket is intended to gather ideas for what other sorts of update handler / VDU functionality we could add to Mango to provide a declarative replacement for JS-based update handlers and VDUs. This is intended as a "blue-sky" ticket to gather ideas that can be implemented simply in a declarative fashion.
From #1534, I wrote:
Extracting:
req
objectreq
objectreq
field value to a key:value pair (or JSON Patch) to insert?type
is string, fielddate
is ISO datetime|UNIX epoch, etc.)"$###.##"
or an integer must be within a certain range)source
and extracting the fieldip_address
, then only allowing documents to come through that have a matchingreq.ip_address
(for example).