globalsign / mgo

The MongoDB driver for Go
Other
1.97k stars 230 forks source link

Upsert with a selector of map[string]map returns inconsistent results #202

Closed timcmiller closed 6 years ago

timcmiller commented 6 years ago

I've been working on creating an upsert query that uses a selector which is a map[string]string and when working with a selector that has multiple keys about 20% of the time I run my query it ends up creating a new index instead of updating the one that is already in the collection.

Here is my code handles the requests and prepares the upsert

package products

import (
    "net/http"

    "github.com/globalsign/mgo/bson"

    "github.com/gin-gonic/gin"
    "github.com/globalsign/mgo"
)

var index = mgo.Index{
    Key: []string{
        "objects",
    },
    Unique: true,
}

type saveBody struct {
    ID      bson.ObjectId     `json:"id" bson:"_id"`
    Objects map[string]string `json:"objects" binding:"required" bson:"objects"`
}

// Save will either create or update a product in the database
func Save(c *gin.Context) {
    var db *mgo.Database
    db = c.MustGet("db").(*mgo.Database)
    appID := c.MustGet("appID").(string)
    var body saveBody
    if err := c.ShouldBindJSON(&body); err != nil {
        common.LogError("unable to unmarshal request json", err)
        c.JSON(400, common.ErrInvalidParams)
        return
    }
    productsDB := db.C("products_" + appID)
    err := productsDB.EnsureIndex(index)
    if err != nil {
        common.LogError("Unable to ensure index", err)
        c.JSON(http.StatusInternalServerError, common.ErrInternalError)
        return
    }

    var product = common.Product{}

    product.Objects = body.Objects

    if body.ID != "" {
        product.ID = body.ID
    }

    objects := bson.M{"objects": body.Objects}
    setQ := bson.M{
        "objects": body.Objects,
    }
    upsertQ := bson.M{}

    info, err := productsDB.Upsert(objects, upsertQ)
    if err != nil {
        common.LogError("failed to upsert product in the save function", err)
        c.JSON(http.StatusInternalServerError, common.ErrInternalError)
        return
    }

    if product.ID == "" {
        if info.UpsertedId != nil {
            product.ID = info.UpsertedId.(bson.ObjectId)
        } else {
            err := productsDB.Find(objects).One(&product)
            if err != nil {
                common.LogError("failed to fetch product ID from database", err)
                c.JSON(http.StatusInternalServerError, common.ErrInternalError)
                return
            }
        }
    }

    c.JSON(http.StatusOK, product)
}

And here is my test case I'm running against this code.

    It("should overwrite the product if a product already has the identical objects property", func() {
        body = []byte(`{
            "objects": {
                "event":"1429425848386",
                "shop":"2137096258"
            }
        }`)

        productsDB := common.Session.DB(DB_URI).C("products_" + APP_ID)
        req, _ := http.NewRequest("POST", "/v2/products/", bytes.NewBuffer(body))
        req.Header.Add("Application-ID", APP_ID)
        router.ServeHTTP(w, req)
        Expect(w.Code).To(Equal(200))
        var product *common.Product
        err := json.Unmarshal([]byte(w.Body.String()), &product)
        var productFromDB *common.Product
        err = productsDB.FindId(product.ID).One(&productFromDB)
        Expect(err).To(BeNil())
        w = httptest.NewRecorder()

        productId := hex.EncodeToString([]byte(product.ID))
        overwriteBody := json.RawMessage(`{
            "id":"` + productId + `",
            "objects": {
                "event":"1429425848386",
                "shop":"2137096258"
            }
        }`)

        req, _ = http.NewRequest("POST", "/v2/products/", bytes.NewBuffer(overwriteBody))
        req.Header.Add("Application-ID", APP_ID)
        router.ServeHTTP(w, req)
        Expect(w.Code).To(Equal(200))
        w = httptest.NewRecorder()

        req, _ = http.NewRequest("GET", "/v2/products/?event=1429425848386&shop=2137096258", nil)
        req.Header.Add("Application-ID", APP_ID)
        router.ServeHTTP(w, req)
        Expect(w.Code).To(Equal(200))
        var products []*common.Product
        err = json.Unmarshal([]byte(w.Body.String()), &products)
        if err != nil {
            panic(err)
        }
        Expect(len(products)).To(Equal(1))
    })

Here is the stack trace I get on the failing test.

  Expected
      <int>: 2
  to equal
      <int>: 1

  /Users/timcmiller/go/src/oprm-scheduler/tests/products/save_test.go:396

  Full Stack Trace
    /Users/timcmiller/go/src/github.com/onsi/gomega/internal/assertion/assertion.go:69 +0x1ef
  github.com/onsi/gomega/internal/assertion.(*Assertion).To(0xc42045c280, 0x1af1f00, 0xc4201d5d60, 0x0, 0x0, 0x0, 0xc42045c280)
    /Users/timcmiller/go/src/github.com/onsi/gomega/internal/assertion/assertion.go:35 +0xae
  oprm-scheduler/tests/products_test.glob..func6.11()
    /Users/timcmiller/go/src/oprm-scheduler/tests/products/save_test.go:396 +0xf75
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc4203a4780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/timcmiller/go/src/oprm-scheduler/tests/products/products_suite_test.go:28 +0x64
  testing.tRunner(0xc4203fe0f0, 0x17cd520)
    /usr/local/go/src/testing/testing.go:746 +0xd0
  created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:789 +0x2de

Again, this only seems to fail about 20% of the time. Not sure what is causing it to sometimes fail.

domodwyer commented 6 years ago

Hi @timcmiller

Would you mind putting together a minimal test case (preferably a real test runnable with "go test") to help us take a look at this?

Thanks! Dom

ckeyes88 commented 6 years ago

Hi @domodwyer I'm one of @timcmiller's colleagues and I've put together a gist here where the only external dependency is mgo and if you re-run go test ./... with a local mongo instance running you'll see what Tim described above will take place. Based on what I see in the database on the failed test case, I've concluded that it's creating a duplicate because the keys of objects are being switched. As expected unless you explicitly index on the keys in an object Mongo does not consider {objects: {foo: "bar", baz: "bah}} to be the same as {objects: {baz: "bah", foo:"bar"}}.

As far as I can tell this order switching is due to the fact that iterating over a map is randomized in Go. I'm pretty sure that we're passing in the objects in the same order so it seems that this change in order might be somewhere in the upsert function (?). If that's the case I would say this is behaving differently than the update with upsert: true in the mongo shell since passing the same object selector in the same order guarantees that's the order of the selector. I may be way off base here so we'd love your input on this issue.

Thanks!

domodwyer commented 6 years ago

Hi @ckeyes88

You're totally right - it is indeed likely the random iteration order of maps that is biting you here. This is something we can't really work around in mgo unfortunately (it's part of the Go language specification) and I bet it catches loads of people out. It is particularly important for compound indexes - using a map will randomly match the index field ordering and use the index, and other times randomly won't resulting in slow and inefficient queries 50% of the time for the compound index covering 2 fields, 66% for 3, etc...

I'd suggest swapping your bson.M (the map type) to a bson.D (an array) which provides deterministic iteration ordering:

objectsS := bson.D{
    {Name: "objects.foo", Value: "bar"},
    {Name: "objects.baz", Value: "bang"},
}

This should solve the issue, though it isn't the easiest thing to work with. It might be best for your use case to replace your Objects: map[string]string with a custom type that implements MarshalJSON() and SetBSON() and has deterministic ordering - the backing data structure is obviously dependent on your access patterns and tradeoff choices, but maybe a hash array mapped trie (HAMT) or something similar will work for you?

Best of luck - I'm interested in what your outcome is, please do feel free to open a PR if you think it'll be useful for others!

Dom

ckeyes88 commented 6 years ago

@domodwyer Thanks for your help! I hadn't thought about using bson.D as I'm still getting familiar with all the mgo features. We'll definitely research a solution that fits our use case and follow up with our solution.

Cheers!