Closed 256dpi closed 4 years ago
I like the direction of the idea. However, if possible I would like to hide the complexity from the accessor functions (Get, Put, ...). While the ForEach
wrapper does not pollute the functions too much, specifying arrayFilters
on most calls as nil
except in the update operators is quite awful.
What about we resolve the dynamic keys which include positional operators and rewrite the update document to only include static keys? That way we would not need to change the accessor functions and can conveniently pre-process the update document in Apply
if arrayFilters
has been specified.
// Resolve will process the specified update document and return an update
// document with all positional operators replaced with absolute array indices.
func Resolve(doc Doc, update Doc, arrayFilters List) (Doc, error) {
// ...
}
func TestResolve(t *testing.T) {
res, err := Resolve(Convert(bson.M{
"array": bson.A{1, 2, 3, 4},
}), Convert(bson.M{
"$mul": bson.M{
"array.$[lte2]": 2,
},
}), List{
Convert(bson.M{
"lte2": bson.M{
"$lte": 2,
},
}),
})
assert.NoError(t, err)
assert.Equal(t, Convert(bson.M{
"$mul": bson.M{
"array.0": 2,
"array.1": 2,
},
}), res)
}
I think the second solution can become expensive in terms of memory. I don't think the first solution show a complexity. For example applyMul can become:
func applyMul(ctx Context, actualPath string, value *bsonkit.Doc, v interface{}) error {
res, err := multiply(*value, v)
if err != nil {
return err
}
// record change
ctx.Value.(*Changes).Updated[actualPath] = res
return nil
}
applyMul(now)
func applyMul(ctx Context, doc bsonkit.Doc, _, path string, v interface{}, arrayFilters map[string]interface{}) error {
// multiply value
res, err := bsonkit.Multiply(doc, path, v)
if err != nil {
return err
}
// record change
ctx.Value.(*Changes).Updated[path] = res
return nil
}
I don't think the complexity is changed. The Get() can be reimplemented passing a nil to the ForEach ArrayFilters parameter. ForEach should take care that arrayFilters can be nil! ForEach can be called only by ProcessExpressions.
In either case we need to think about the following things:
mongokit
since it is a MongoDB specific functionality.Match
, Project
, Apply
and Extract
.Based on that, I see the following approaches:
The functionality is integrated in methods that access and transform a document. This basically means, that we reimplement or extend the bsonkit
accessors to handle positional operators and thus array filters transparently. This would be something in the direction of the ForEach
idea.
The functionality is external and is applied when necessary as an additional step in processing the query/update. This would look like the Resolve
proposal that resolves positional operators before passing the document on to other functions.
Somewhere in the middle of the two: An external Analyze
method would analyze a query/update document and provide some data structure that describes the means of accessing the document. Instead of Get(doc bsonkit.Doc, path string)
we would use Get(doc bsonkit.Doc, resolver Resolver)
where Resolver
would be a function that resolves the path statically, or by processing positional operators, or using a user-specific logic.
In terms of separation of concerns and extendability I favor approach 2 and 3. But in terms of simplicity I favor 2, since it could be implemented quickly which would allow us to focus on the edge cases and writing tests. Then later, we could easily experiment and implement using approach 3. I don't see any benefits in approach 1 except for being maybe more performant. However I'm generally against decisions lead by premature optimization.
I think the approach 2 is the best. But how the resolver interface should look like?
type Resolver func (func(*interface{}) (bool, error)) error
type Resolver func() ([]*interface{}, error)
In mongokit we can define a custom resolver that can be injected in the bsonkit utils.
I started some research into this and came up with the following proposal:
// Resolve will resolve all positional operators in the provided path using the
// query, document and array filters. For each match it will call the callback
// with the generated absolute path.
func Resolve(path string, query, doc bsonkit.Doc, arrayFilters bsonkit.List, callback func(path string) error) error {
// check if path includes positional operators
if !strings.ContainsRune(path, '$') {
return callback(path)
}
// TODO: Resolve all positional operators and call callback for each
// resolved absolute path.
return nil
}
This function can then be used in an operator to transparently resolve the paths:
func applySet(ctx Context, doc bsonkit.Doc, _, path string, v interface{}) error {
// get context
acx := ctx.Value.(*ApplyContext)
// TODO: We probably should apply the changes to a clone of the document
// we're using to resolve the queries. Otherwise we might process some
// elements multiple times?
return Resolve(path, acx.Query, doc, acx.ArrayFilters, func(path string) error {
// set new value
_, err := bsonkit.Put(doc, path, v, false)
if err != nil {
return err
}
// record change
ctx.Value.(*Changes).Updated[path] = v
return nil
})
}
To simplify things even further, we can also resolve the paths in the Process
function once the applyAll
logic has also been moved to the process framework.
This proposal is very similar to your ForEach
proposal except it targets the layers above and does not pollute the basic accessors in bsonkit
with this logic.
We can optimize the approach quite a bit if we construct the new path in a byte slice and cast it to a string using some unsafe.Pointer
magic before calling the callback.
In the future we might add some features to the bsonkit
accessors to dynamically lookup things. However, I'd like to tackle that in a separate step.
FYI: I looked into the other positional operators$
and $[]
and learned that they are actually dependent on the query (I never used these MongoDB features before). This means that we also need to analyze and extract portions of the query to match arrays with these operators.
Is the path+arrayFilters a mongodb concept or they can be a general concept?
It's a MongoDB concept since array filters use the MongoDB query syntax.
Ok. I agree with you. I will implement the resolver. Where should I put the Resolve() function? mongokit/resolve.go?
Yes mongokit/resolve.go
sounds good.
Are you sure that the type of arrayFilters
(bsonkit.List
) is the best. I think it's better that it's a map[string]interface{}.
See this function
While map[string]interface{}
would be slightly easier to access than the bsonkit.List
I would like to stick with the later. Mainly because MongoDB chose to represent array filter as a list of objects. I don't know why, but maybe for extendability in the future. This way we use the same data structures for the same things which makes things easier.
Ok
Do you know that Match() and Process() should be modified to support array filters? I need to run a Match() on a array element that may neeed to Resolve() a path. for example
ResolveTest(t, "foo.$[valueGt15].bar.$[]", bsonkit.Convert(bson.M{}), bsonkit.Convert(bson.M{
"foo": bson.A{
bson.M{
"value": 10,
"bar": bson.A{
"foobar",
"barfoo",
},
},
bson.M{
"value": 20,
"bar": bson.A{
"foobar",
},
},
},
}), bsonkit.List{
"valueGt15.value": {
"$gt": 15,
},
}, []string{
"foo.1.bar.0",
})
@amreo Thanks for working on this!
I have a idea. Why don't define every function using a ForEach function?
The idea is that when
ForEach
find recursively a value that match the path, it calls the function process passing the reference to the value. If theprocess
function return true, the computation stop, otherwise continue.Example
Example code (Get)
Example code (Inc)
Example code (Unset)
Actual apply's or process's functions aren't correct because they makes assumptions that the path reference a single value instead multiple values that should be updated separately.