Plugin to detect and/or prevent re-use of an instance/sesssion

ansel1 commented 2 years ago

Describe the feature

A plugin which detects when a db instance is being re-used for multiple finalizer methods. The plugin marks the database instance the first time any finalizer method is called, then warns or returns an error if a second finalizer method is called on that same instance.

If you think this feature is a good idea, I can submit a pull request. I have implemented a prototype, below. For a pull request, I would make it a little more configurable (warn vs error), and remove the merry dependency. I'm using merry in the prototype out of convenience: it handles capturing and printing stack traces.

package gormplugin

import (
    "github.com/ansel1/merry"
    "gorm.io/gorm"
)

const key = "prevent_reuse:consumed"
const stackKey = "prevent_reuse:stack"

type Plugin struct {
    Debug bool
}

func (p *Plugin) Name() string {
    return "noreuse"
}

func (p *Plugin) Initialize(db *gorm.DB) error {
    db.Callback().Raw().Before("*").Register("prevent_reuse", p.check)
    db.Callback().Delete().Before("*").Register("prevent_reuse", p.check)
    db.Callback().Create().Before("*").Register("prevent_reuse", p.check)
    db.Callback().Row().Before("*").Register("prevent_reuse", p.check)
    db.Callback().Query().Before("*").Register("prevent_reuse", p.check)
    db.Callback().Update().Before("*").Register("prevent_reuse", p.check)
    return nil
}

func (p *Plugin) check(db *gorm.DB) {
    if _, isSet := db.InstanceGet(key); isSet {
        err := merry.New("This db session has already been used to execute a command.  DB sessions can" +
            " only be used once.  Create a new session with db.Session(*gorm.Session{}) or db.WithContext(ctx).")
        if p.Debug {
            if stack, ok := db.InstanceGet(stackKey); ok {
                if stackErr, ok := stack.(error); ok {
                    err = err.WithCause(stackErr)
                }
            }
        }
        db.AddError(err)
    } else {
        db.InstanceSet(key, true)
        if p.Debug {
            db.InstanceSet(stackKey, merry.New("here"))
        }
    }
}

Motivation

gorm v2 doesn't clone the db instance as often as v1, and the migration docs suggest that once a finalizer method has been called on an instance, it isn't safe to re-use that instance. v2 suggests you should use Session() or WithContext() to create re-usable instances instead.

But this recommendation isn't enforced or detected by gorm. Code which re-uses an instance may or may not work, and can fail in subtle ways, so it can be difficult to know where it is happening.

I'm not sure whether there are any legitimate cases where one would want to re-use an instance. Count() seems to be one. The implementation of Count() seems to be very careful not to poison the instance. But it seems like, even in the case of Count(), it would be safer to perform the count in a new instance.

Related Issues

a631807682 commented 2 years ago

SGTM, I have dealt with a lot of issues about re-use without Session. If you can monitor the re-used problem, it should be a good plugin. It can be used in the playground to allow users to find problems by themselves.

ansel1 commented 2 years ago

I've discovered at least one place where gorm itself triggers this warning...hard to say if it's legit or not. In Save(), it calls the Update() callbacks, and if that fails, it calls Create(). Interestingly, it also calls Find(), but creates a new session for that.

Now since the re-used Statement is not being modified in any way...no new clauses, same model type...it's probably safe. But it's really tricky to know when its OK to do that, and when its not.

default:
        selectedUpdate := len(tx.Statement.Selects) != 0
        // when updating, use all fields including those zero-value fields
        if !selectedUpdate {
            tx.Statement.Selects = append(tx.Statement.Selects, "*")
        }

        tx = tx.callbacks.Update().Execute(tx)

        if tx.Error == nil && tx.RowsAffected == 0 && !tx.DryRun && !selectedUpdate {
            result := reflect.New(tx.Statement.Schema.ModelType).Interface()
            if result := tx.Session(&Session{}).Limit(1).Find(result); result.RowsAffected == 0 {
                return tx.Create(value)
            }
        }

a631807682 commented 2 years ago

Reuse is safe if Statement.Clauses has not been changed, this does not mean that it is not safe if Statement.Clauses changed. I think the purpose of this plugin is not to help users find bugs, but to standardize users' code.

// bad
tx = .....
tx.Where("age = ?", 10).Find(...)
tx.Where("name = ?","jinzhu").Find(...) // user want to find user who age = 10 and name = jinzhu 

// good
tx = .....
tx = tx.Session(&Session{})
tx.Where("age = ?", 10).Find(...)
tx.Where("age = ? and name = ?",10,"jinzhu").Find(...)

ansel1 commented 2 years ago

It's also not safe if you use two different models, table expressions, etc.

The plugin has evolved a bit. It can be configured to either return an error, or just log a warning. And I had to put in a bunch of code to allow that one case of re-use inside gorm, when calling Save(). It tries to detect that very specific use case by examining the stack and looking for a particular pattern. It's pretty ugly, and probably won't scale if there are other valid re-use scenarios it needs to detect, so I'm not sure whether this plugin is really tenable without native support in gorm.

//go:build gormv2
// +build gormv2

// Package sessionreusedetector is a gorm plugin which detects when a gorm Statement
// is re-used for more than one finalizer method.  In gormv2, when you start a chain
// of methods, like Order(), Sort(), Scopes(), Where(), etc, a new Statement is cloned
// for that chain.  The chain ends in a "finalizer" method, which is any of the gorm
// methods which actually perform the database operation, like Create(), Delete(), Find()
// etc.
//
// The db instance returned by those finarlizer methods generally shouldn't be used again
// for any other finalizer method, according to gorm's docs.  For example, if you
// use a db instance to create one struct:
//
//  tx := db.Create(key)
//
// ...then use the returned db instance to create a different struct:
//
//  tx.Create(user)
//
// ...that will probably lead to an error, because the Statement in `tx` has already been
// used to do something to the `keys` table, and can't be re-used on a different able.
//
// The suggested solution is to use Session() or WithContext() first:
//
//  tx := db.WithContext(ctx)
//  tx.Create(key)
//  tx.Create(user)
//
// That is allowed because WithContext() returns a db which will lazily create a new Statement
// each time a chain method or finalizer (like Create) is called on it.  So the instance
// returned by db.WithContext() is re-usable, where the instance returned by Create() (or
// any chain or finalizer method) isn't.
//
// Honestly, this whole thing is confusing and difficult to grok without really closely looking
// at gorm's source.  And if you break this rule, it may or may not actually lead to a bug, so you
// may not realize you broke it.
//
// This plugin *enforces* this rule.  If you try and re-use an instance, the finalizer method will
// fail.  If Debug is true, the plugin will record a stack trace when the instance was first used,
// and include that location in the error message.  There is some performance overhead to using
// Debug mode.  It may be best to use this only in unit tests.
//
// This package is experimental, subject to change or removal.
package sessionreusedetector

import (
    "github.com/ansel1/merry/v2"
    "github.com/gemalto/flume"
    "gorm.io/gorm"
    "os"
    "runtime"
    "strconv"
)

var log = flume.New("sessionreusedetector")

const key = "prevent_reuse:consumed"
const stackKey = "prevent_reuse:stack"
const saveUpdateKey = "prevent_reuse:save_update"

// Plugin detects illegal db instance re-use.  When a db instance is first used in a finalizer,
// it is marked as consumed.  An attempt to use a consumed db instance fails with an error.
type Plugin struct {
    // Debug adds information to the error noting where the db instance was first consumed.
    Debug bool
    // ReturnError will prevent the reuse by causing the consumed Statement to return error.  If
    // false, only a warning is logged, but the operation will proceed.
    ReturnError bool
    // Disable makes the plugin a no-op.  Must be set before the plugin is installed.
    Disable bool
}

// New creates a new plugin, configured from environment variables.
func New() *Plugin {
    disable, _ := strconv.ParseBool(os.Getenv("YUGODB_SESSION_REUSE_DETECTOR_DISABLE"))
    debug, _ := strconv.ParseBool(os.Getenv("YUGODB_SESSION_REUSE_DETECTOR_DEBUG"))
    returnError, _ := strconv.ParseBool(os.Getenv("YUGODB_SESSION_REUSE_DETECTOR_RETURN_ERROR"))
    return &Plugin{
        Disable:     disable,
        Debug:       debug,
        ReturnError: returnError,
    }
}

// Name implements gorm.Plugin
func (p *Plugin) Name() string {
    return "noreuse"
}

// Initialize implements gorm.Plugin
func (p *Plugin) Initialize(db *gorm.DB) error {
    if p.Disable {
        return nil
    }

    if err := db.Callback().Raw().Before("*").Register("prevent_reuse", p.check); err != nil {
        return merry.Wrap(err)
    }
    if err := db.Callback().Delete().Before("*").Register("prevent_reuse", p.check); err != nil {
        return merry.Wrap(err)
    }
    if err := db.Callback().Create().Before("*").Register("prevent_reuse", p.checkCreate); err != nil {
        return merry.Wrap(err)
    }
    if err := db.Callback().Row().Before("*").Register("prevent_reuse", p.check); err != nil {
        return merry.Wrap(err)
    }
    if err := db.Callback().Query().Before("*").Register("prevent_reuse", p.check); err != nil {
        return merry.Wrap(err)
    }
    if err := db.Callback().Update().Before("*").Register("prevent_reuse", p.checkUpdate); err != nil {
        return merry.Wrap(err)
    }
    return nil
}

func (p *Plugin) checkUpdate(db *gorm.DB) {
    p.checkInternal(db, true, false)
}

func (p *Plugin) checkCreate(db *gorm.DB) {
    p.checkInternal(db, false, true)
}

func (p *Plugin) check(db *gorm.DB) {
    p.checkInternal(db, false, false)
}

func (p *Plugin) checkInternal(db *gorm.DB, isUpdate, isCreate bool) {
    if _, isSet := db.InstanceGet(key); !isSet {
        db.InstanceSet(key, true)

        if p.Debug || isUpdate {
            // since we have to incur the cost of capturing a stack
            // anyway for the save/update/create logic, might as well
            // include it even if it isn't debug mode
            err := merry.New("Statement was originally consumed here.")
            db.InstanceSet(stackKey, err)

            if isUpdate {
                stack := merry.Stack(err)[:10]
                if len(stack) > 0 {
                    frames := runtime.CallersFrames(stack)
                    for {
                        frame, more := frames.Next()
                        if frame.Function == "gorm.io/gorm.(*DB).Save" {
                            db.InstanceSet(saveUpdateKey, true)
                            break
                        }
                        if !more {
                            break
                        }
                    }
                }
            }
        }
        return
    }

    // session has been re-used
    err := merry.New("This db session has already been used to execute a command.  DB sessions can" +
        " only be used once.  Create a new session with db.Session(*gorm.Session{}) or db.WithContext(ctx).")

    // first check whether this is a save/update/create sequence, which is a case
    // of re-use inside gorm itself, which we need to allow
    if saveUpdate, ok := db.InstanceGet(saveUpdateKey); ok {
        // first, remove this key now.  Any case of re-use should reset this flag
        db.InstanceSet(saveUpdateKey, false)
        if saveUpdateBool, _ := saveUpdate.(bool); saveUpdateBool && isCreate {
            stack := merry.Stack(err)[:10]
            if len(stack) > 0 {
                frames := runtime.CallersFrames(stack)

                for {
                    frame, more := frames.Next()
                    if frame.Function == "gorm.io/gorm.(*DB).Save" {
                        // the only other time this session has been re-used was in an update
                        // called by gorm.Save().  gorm.Save() is known to sometimes then make
                        // a Create() call on the same session.  We detect this, and consider
                        // it a valid re-use
                        if log.IsDebug() {
                            if stack, ok := db.InstanceGet(stackKey); ok {
                                if stackErr, ok := stack.(error); ok {
                                    err = merry.Wrap(err, merry.WithCause(stackErr))
                                }
                                log.Debug("session was re-used, but only inside a call to gorm.Save() which is allowed.", err)
                            }
                        }
                        return
                    }
                    if !more {
                        break
                    }
                }
            }
        }
    }

    if p.Debug {
        if stack, ok := db.InstanceGet(stackKey); ok {
            if stackErr, ok := stack.(error); ok {
                err = merry.Wrap(err, merry.WithCause(stackErr))
            }
        }
    }
    if p.ReturnError {
        _ = db.AddError(err)
    } else {
        log.Error("gorm db instance re-used", err)
    }
}

kainosnoema commented 2 years ago

We've run into db instance re-use issues regularly. At times (such as when a re-used db creates a subquery that's then passed in as an argument), it can even create an infinite loop during query generation that locks up the goroutine.

Imo, protecting against this and warning or raising should be built in to gorm directly. It's just way too easy to miss and causes big issues.

ansel1 commented 11 months ago

We're continuing to run into similar issues, even with our plugin. I haven't yet figured out why our plugin is no longer detecting some cases of re-use. But this seems like a fundamental flaw in this library. The errors that happen if you re-use a session are usually very difficult to trace back to the root cause. And throwing an extra "Session()" in here and there magically fixes them.

go-gorm / gorm