Positional Operators Support (Epic)

amreo commented 4 years ago

I have a idea. Why don't define every function using a ForEach function?

func ForEach(doc interface{}, path string, arrayFilters map[string]interface{}, process func(*interface{}) (bool, error)) error

The idea is that when ForEach find recursively a value that match the path, it calls the function process passing the reference to the value. If the process function return true, the computation stop, otherwise continue.

Example

value = {
    "foo": [
        {"bar": 20, "baz": [100, -300, 400]},
        {"bar": 43, "baz": [-100, 300, -400]}
    ]
}

update = {
    "$inc": {
          "foo.$[ok].baz.$[positive]": 1,
     }
}

arrayFilter = {
    "ok": { "$gt": 30 },
    "positive": { "$gt": 0 }
}

Inc(value, path, arrayFilter) -->
      ForEach(value, path, arrayFilter, processor) -->
             ForEach([{"bar": 20, "baz": [100, -300, 400]},  {"bar": 43, "baz": [-100, 300, -400]}], "$[ok].baz.4[positive]", arrayFilter, processor) -->
                 //Don't match so ForEach({"bar": 20, "baz": [100, -300, 400]}, "baz.4[positive]", arrayFilter, processor) is not called-->
                 ForEach({"bar": 43, "baz": [-100, 300, -400]}, "baz.4[positive]", arrayFilter, processor) -->
                         ForEach([-100, 300, -400], "$[positive]", arrayFilter, processor) -->
                                 // -100 don't match with arrayFilter condition
                                 Process(&300) --> false
                                 // -400 don't match with arrayFilter condition

Example code (Get)

func Get(doc interface{}, path string, arrayFilters map[string]interface{}) interface{} {
    var foundValue interface{}
    ForEach(doc, path, arrayFilters, func(value *interface{}) (bool, error) {
        foundValue = *value
        return true, nil
    })

    return foundValue
}

Example code (Inc)

func Inc(doc interface{}, path string, arrayFilters map[string]interface{}, increment interface{}) interface{} {
    ForEach(doc, path, arrayFilters, func(value *interface{}) (bool, error) {
        var oldValue interface{} = *value
        //Switch algorithm...

        //Set the value
        *value = newValue
        return false, nil
    })
}

Example code (Unset)

func Unset(doc interface{}, path string, arrayFilters map[string]interface{}, increment interface{}) interface{} {
    ForEach(doc, partBeforeLastDot(path), arrayFilters, func(value *interface{}) (bool, error) {
        field := partAfterLastDot(path)
        //Handle case if field == '' OR isn't a field name (eg. $[], $[<field name>], <index>...)
        //Handle case if *value isn't a bson

        bson := *value.(bson.M) //Or bson.D
        for _, pair := bson {
            if pair.key == field {
                delete(bson, field)
            }
        } 
    })
}

Actual apply's or process's functions aren't correct because they makes assumptions that the path reference a single value instead multiple values that should be updated separately.

256dpi commented 4 years ago

I like the direction of the idea. However, if possible I would like to hide the complexity from the accessor functions (Get, Put, ...). While the ForEach wrapper does not pollute the functions too much, specifying arrayFilters on most calls as nil except in the update operators is quite awful.

What about we resolve the dynamic keys which include positional operators and rewrite the update document to only include static keys? That way we would not need to change the accessor functions and can conveniently pre-process the update document in Apply if arrayFilters has been specified.

// Resolve will process the specified update document and return an update
// document with all positional operators replaced with absolute array indices.
func Resolve(doc Doc, update Doc, arrayFilters List) (Doc, error) {
    // ...
}

func TestResolve(t *testing.T) {
    res, err := Resolve(Convert(bson.M{
        "array": bson.A{1, 2, 3, 4},
    }), Convert(bson.M{
        "$mul": bson.M{
            "array.$[lte2]": 2,
        },
    }), List{
        Convert(bson.M{
            "lte2": bson.M{
                "$lte": 2,
            },
        }),
    })
    assert.NoError(t, err)
    assert.Equal(t, Convert(bson.M{
        "$mul": bson.M{
            "array.0": 2,
            "array.1": 2,
        },
    }), res)
}

amreo commented 4 years ago

I think the second solution can become expensive in terms of memory. I don't think the first solution show a complexity. For example applyMul can become:

func applyMul(ctx Context, actualPath string, value *bsonkit.Doc, v interface{}) error {
    res, err := multiply(*value, v)
    if err != nil {
        return err
    }
    // record change
    ctx.Value.(*Changes).Updated[actualPath] = res

    return nil
}

applyMul(now)

func applyMul(ctx Context, doc bsonkit.Doc, _, path string, v interface{}, arrayFilters map[string]interface{}) error {
    // multiply value
    res, err := bsonkit.Multiply(doc, path, v)
    if err != nil {
        return err
    }

    // record change
    ctx.Value.(*Changes).Updated[path] = res

    return nil
}

I don't think the complexity is changed. The Get() can be reimplemented passing a nil to the ForEach ArrayFilters parameter. ForEach should take care that arrayFilters can be nil! ForEach can be called only by ProcessExpressions.

256dpi commented 4 years ago

In either case we need to think about the following things:

The logic must be implemented in mongokit since it is a MongoDB specific functionality.
It should support customization. Users should be able to inject custom forms of positional operators similar to Match, Project, Apply and Extract.
The implementation should be separated as much as possible from other concerns.

Based on that, I see the following approaches:

The functionality is integrated in methods that access and transform a document. This basically means, that we reimplement or extend the bsonkit accessors to handle positional operators and thus array filters transparently. This would be something in the direction of the ForEach idea.
The functionality is external and is applied when necessary as an additional step in processing the query/update. This would look like the Resolve proposal that resolves positional operators before passing the document on to other functions.
Somewhere in the middle of the two: An external Analyze method would analyze a query/update document and provide some data structure that describes the means of accessing the document. Instead of Get(doc bsonkit.Doc, path string) we would use Get(doc bsonkit.Doc, resolver Resolver) where Resolver would be a function that resolves the path statically, or by processing positional operators, or using a user-specific logic.

In terms of separation of concerns and extendability I favor approach 2 and 3. But in terms of simplicity I favor 2, since it could be implemented quickly which would allow us to focus on the edge cases and writing tests. Then later, we could easily experiment and implement using approach 3. I don't see any benefits in approach 1 except for being maybe more performant. However I'm generally against decisions lead by premature optimization.

amreo commented 4 years ago

I think the approach 2 is the best. But how the resolver interface should look like? type Resolver func (func(*interface{}) (bool, error)) error type Resolver func() ([]*interface{}, error)

In mongokit we can define a custom resolver that can be injected in the bsonkit utils.

256dpi commented 4 years ago

I started some research into this and came up with the following proposal:

// Resolve will resolve all positional operators in the provided path using the
// query, document and array filters. For each match it will call the callback
// with the generated absolute path.
func Resolve(path string, query, doc bsonkit.Doc, arrayFilters bsonkit.List, callback func(path string) error) error {
    // check if path includes positional operators
    if !strings.ContainsRune(path, '$') {
        return callback(path)
    }

    // TODO: Resolve all positional operators and call callback for each
    //  resolved absolute path.

    return nil
}

This function can then be used in an operator to transparently resolve the paths:

func applySet(ctx Context, doc bsonkit.Doc, _, path string, v interface{}) error {
    // get context
    acx := ctx.Value.(*ApplyContext)

    // TODO: We probably should apply the changes to a clone of the document
    //  we're using to resolve the queries. Otherwise we might process some
    //  elements multiple times?

    return Resolve(path, acx.Query, doc, acx.ArrayFilters, func(path string) error {
        // set new value
        _, err := bsonkit.Put(doc, path, v, false)
        if err != nil {
            return err
        }

        // record change
        ctx.Value.(*Changes).Updated[path] = v

        return nil
    })
}

To simplify things even further, we can also resolve the paths in the Process function once the applyAll logic has also been moved to the process framework.

This proposal is very similar to your ForEach proposal except it targets the layers above and does not pollute the basic accessors in bsonkit with this logic.

We can optimize the approach quite a bit if we construct the new path in a byte slice and cast it to a string using some unsafe.Pointer magic before calling the callback.

In the future we might add some features to the bsonkit accessors to dynamically lookup things. However, I'd like to tackle that in a separate step.

FYI: I looked into the other positional operators$ and $[] and learned that they are actually dependent on the query (I never used these MongoDB features before). This means that we also need to analyze and extract portions of the query to match arrays with these operators.

amreo commented 4 years ago

Is the path+arrayFilters a mongodb concept or they can be a general concept?

256dpi commented 4 years ago

It's a MongoDB concept since array filters use the MongoDB query syntax.

amreo commented 4 years ago

Ok. I agree with you. I will implement the resolver. Where should I put the Resolve() function? mongokit/resolve.go?

256dpi commented 4 years ago

Yes mongokit/resolve.go sounds good.

amreo commented 4 years ago

Are you sure that the type of arrayFilters (bsonkit.List) is the best. I think it's better that it's a map[string]interface{}. See this function

256dpi commented 4 years ago

While map[string]interface{} would be slightly easier to access than the bsonkit.List I would like to stick with the later. Mainly because MongoDB chose to represent array filter as a list of objects. I don't know why, but maybe for extendability in the future. This way we use the same data structures for the same things which makes things easier.

amreo commented 4 years ago

Ok

amreo commented 4 years ago

Do you know that Match() and Process() should be modified to support array filters? I need to run a Match() on a array element that may neeed to Resolve() a path. for example

    ResolveTest(t, "foo.$[valueGt15].bar.$[]", bsonkit.Convert(bson.M{}), bsonkit.Convert(bson.M{
        "foo": bson.A{
            bson.M{
                                "value": 10,
                "bar": bson.A{
                    "foobar",
                    "barfoo",
                },
            },
            bson.M{
                                "value": 20,
                "bar": bson.A{
                    "foobar",
                },
            },
        },
    }), bsonkit.List{
               "valueGt15.value": {
                      "$gt": 15,
                },
         }, []string{
        "foo.1.bar.0",
    })

256dpi commented 4 years ago

@amreo Thanks for working on this!

256dpi / lungo

Positional Operators Support (Epic) #5