golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.05k stars 17.55k forks source link

x/tools/go/analysis: request some form of static thread safety analysis for Go #46788

Open g-talbot opened 3 years ago

g-talbot commented 3 years ago

In issue #24889 there's a short paragraph about static thread safety analysis that clang does for C++ programs:

clang supports thread safety analysis for C++ (https://clang.llvm.org/docs/ThreadSafetyAnalysis.html) based on annotations in the source code. This would clearly be applicable to Go code.

I'm pretty agnostic as to how some form of static thread safety analysis enters the Go ecosystem, but I would like to have it. I've used this at a previous employer (Google) to great effect, and IMO it would be of great benefit to Go programs as well.

Here's a strawman proposal, more to spur discussion than be a "real" solution. What if a struct is defined with a lock in it like so:

type MyStruct struct {
  sync.Mutex

  field1 int
  field2 string
  field3 []string
}

Let's annotate MyStruct to require locking on all of its members:

//go: guarded
type MyStruct struct {
  sync.RWMutex

  // by having //go: guarded above this implies that MyStruct must be locked
  // to access the members.  Reads of any of the fields would require RLock() and
  // writes would require Lock().
  field1 int
  field2 string
  field3 []string
}

So if I defined a function that accessed the members and didn't lock, compilation would fail:

func (m *MyStruct) GetField1() int {
  return m.field1 // does not compile
}

But this would work:

func (m *MyStruct) GetField1() int {
  m.RLock()
  defer m.RUnlock()
  return m.field1
}

And this too:

func (m *MyStruct) GetField1() int {
  m.RLock()
  defer m.RUnlock()
  return m.GetField1WithLock()
}

//go: with-rlock: m
func (m *MyStruct) GetField1WithRLock() int {
  return m.field1
}

I know folks don't love the comment pragmas, but I didn't have a better idea. Perhaps the basic gist of tying the lock to a specific struct that there could be some way that's "neater" and "more go like" than the complex clang annotations? With anonymous struct fields making for clean composition?

prattmic commented 3 years ago

The gVisor project has recently created checklocks, a go vet style analysis tool which fits in the vein of what you are suggesting here.

IIRC, the tool is fairly immature, so I don't believe it is full-featured and simplified in places for typical conventions used in gVisor, but I think it provides a good starting point, or potentially could be directly used by other projects.

cc @timothy-king @amscanne @hbhasker

g-talbot commented 3 years ago

The gVisor project has recently created checklocks, a go vet style analysis tool which fits in the vein of what you are suggesting here.

IIRC, the tool is fairly immature, so I don't believe it is full-featured and simplified in places for typical conventions used in gVisor, but I think it provides a good starting point, or potentially could be directly used by other projects.

cc @timothy-king @amscanne @hbhasker

This is pretty cool, though two things might be showstoppers for us using it right now.

  1. Doesn't distinguish between sync.Mutex and sync.RWMutex.
  2. Defer isn't supported. i.e.defer x.Unlock() which is pretty much required if you want panic to work in presence of locking.

Are there plans to address these? Also, one suggestion--could there be " // +checklocks: all-guarded-by: mu" that specifies that every field in a struct is guarded by the mutex?

timothy-king commented 3 years ago

My plan at the moment is to have an internship project look into lock annotations starting at the beginning of September. It will likely start from checklocks. We would also add support for more code patterns (defer, RWMutex, sync.Locker?). The goal of the project would be to evaluate the ergonomics of the checker in real code. I am also curious if we can mix struct tags and comments. If all goes well, this could turn into a proposal or an experimental tool.

No promises, but giving you full transparency that this is on my radar.

So if I defined a function that accessed the members and didn't lock, compilation would fail:

This would be a breaking changing. This would require a Go2 proposal. A static analyzer is a much smaller lift.

  type MyStruct struct {
  sync.RWMutex

Embeddings are definitely something that needs more thought. No concrete ideas from me at the moment though.

Are there plans to address these? Also, one suggestion--could there be " // +checklocks: all-guarded-by: mu" that specifies that every field in a struct is guarded by the mutex?

I don't really see why not. That would just be syntactic sugar. There is also a possibility of trying to capture the "mutex hat" idiom in an annotation. That is more flexible, but it is not clear if users would find this too brittle/confusing. Real code examples would really help here.

bcmills commented 3 years ago

Note that you can get a lot of the benefits of static annotation by switching from sync.Mutex to a 1-buffered channel:

type MyStruct struct {
    fields chan *myStructFields // 1-buffered
}

func NewMyStruct() *MyStruct {
    fields := make(chan *myStructFields, 1)
    fields <- &myStructFields{}
    return &MyStruct{fields: fields}
}

type myStructFields struct {
    field1 int
    field2 string
    field3 []string
}

Then the scope of the fields is more closely tied to the synchronization operations for those fields:

func (m *MyStruct) GetField1() int {
    return m.field1 // Does not compile — field1 is a member of myStructFields, not MyStruct.
}
func (m *MyStruct) GetField1() int {
    fields := <-m.fields                   // acquire
    defer func() { m.fields <- fields }()  // release
    return fields.field1
}
cagedmantis commented 3 years ago

I added a package for this issue. Please feel free to change it from x/tools/go/analysis if it belongs somewhere else.

scott-cotton commented 3 years ago

There is also https://github.com/amit-davidson/Chronos

g-talbot commented 1 year ago

@timothy-king how did this go? Did you have a chance to get your intern to look into this?

My plan at the moment is to have an internship project look into lock annotations starting at the beginning of September. It will likely start from checklocks. We would also add support for more code patterns (defer, RWMutex, sync.Locker?). The goal of the project would be to evaluate the ergonomics of the checker in real code. I am also curious if we can mix struct tags and comments. If all goes well, this could turn into a proposal or an experimental tool.

No promises, but giving you full transparency that this is on my radar.

So if I defined a function that accessed the members and didn't lock, compilation would fail:

This would be a breaking changing. This would require a Go2 proposal. A static analyzer is a much smaller lift.

  type MyStruct struct {
  sync.RWMutex

Embeddings are definitely something that needs more thought. No concrete ideas from me at the moment though.

Are there plans to address these? Also, one suggestion--could there be " // +checklocks: all-guarded-by: mu" that specifies that every field in a struct is guarded by the mutex?

I don't really see why not. That would just be syntactic sugar. There is also a possibility of trying to capture the "mutex hat" idiom in an annotation. That is more flexible, but it is not clear if users would find this too brittle/confusing. Real code examples would really help here.

adonovan commented 1 year ago

Such a tool has been a recurring request, but it's a substantial project as it requires interprocedural analysis to do well. It also needs to degrade gracefully at the boundary between annotated and unannotated portions of the program.

I made some initial experiments in this direction many years ago, and one feature I would like to see in the solution is the use of Go itself--not comments--to express the annotations, so that they are subject to type checking, operated on by renaming tools, reported by cross-reference queries, and so on; otherwise they will go as stale and unread as ordinary doc comments.

For example, the code below shows a hypothetical annotation.Guards function whose body is empty, but whose presence in the source tells the analysis tool the relationship between the mutex (S.mu) and the fields it guards (S.x, etc). The *S parameter is in effect is a universal ("for all") quantifier.

import "annotation"

type S struct {
     mu sync.Mutex
     x, y, z int
}

var mu sync.Mutex
var g int

func _(s *S) {
    annotation.Guards(mu, g) // a simple relation
    annotation.Guards(s.mu, s.x, s.y, s.z) // a relation universally quantified over all S
}
g-talbot commented 1 year ago

Hey Alan how are you?!?

Personally I have no skin-in-the-game for how the annotations work, I just want them. As to your specific proposal here, this moves the annotations from where the mutex is declared to somewhere else? I probably have at least a mild preference for seeing the constraints in the definition of the data type & specific fields, rather than elsewhere in the program, but maybe I'm misunderstanding?

On Tue, Sep 27, 2022 at 9:31 AM Alan Donovan @.***> wrote:

Such a tool has been a recurring request, but it's a substantial project as it requires interprocedural analysis to do well. It also needs to degrade gracefully at the boundary between annotated and unannotated portions of the program.

I made some initial experiments in this direction many years ago, and one feature I would like to see in the solution is the use of Go itself--not comments--to express the annotations, so that they are subject to type checking, operated on by renaming tools, reported by cross-reference queries, and so on; otherwise they will go as stale and unread as ordinary doc comments.

For example, the code below shows a hypothetical annotation.Guards function whose body is empty, but whose presence in the source tells the analysis tool the relationship between the mutex (S.mu) and the fields it guards (S.x, etc). The *S parameter in in effect is a universal ("for all") quantifier.

import "annotation"

type S struct { mu sync.Mutex x, y, z int }

var mu sync.Mutex var g int

func _(s *S) { annotation.Guards(mu, g) // a simple relation annotation.Guards(s.mu, s.x, s.y, s.z) // a relation universally quantified over all S }

— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/46788#issuecomment-1259513586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ2F6767HTXABYFZOLFHDHTWALZJZANCNFSM46Z76WQQ . You are receiving this because you authored the thread.Message ID: @.***>

timothy-king commented 1 year ago

We did the internship, and we managed to learn quite a bit. If you are interested in this topic, I strongly encourage you to try out checklocks. We based the internship on modifying that code base.

Here are a few of my big take-aways and opinions after doing this: 1) the value of lock annotations seems consistent with other languages the experiments we tried. 1) the lock annotation algorithms for other languages (like LLVM's) are not yet a perfect match for Go due to defer. The challenge is the pattern if ... { x.mu.Lock(); defer x.mu.Unlock(); _ = x.guarded ; } /* code that does not touch mu or guarded */. This pattern is too common to ignore and should be supported. We worked on getting to a polynomial time algorithm and had some success with some kinda complex state representation. But I do not think we yet know a satisfying polynomial time algorithm for this that will not confuse tool users. Going out on a limb, I suspect the problems we saw is fixable by recording both the pre and post lock states of blocks so we can faithfully know when the result of a meet operations is not an equality, doing 'reversish abstract interpretation' from the returns to execute the defers, and only reporting a mismatch when the meet on a reverse post state has a disequal result to its inputs. I am speculating though. I have not tried this. 1) a mutex-hat opt in annotations that mean "the field mu guards the immediately following fields in the struct" seems largely sufficient after our experiments. This annotation struck a good balance of feeling minimal while giving an opt-in signal. It can also be a struct tag. 1) function lock annotations still felt a bit too verbose to feel "Go-like". The challenge we are setting for ourselves is whether we get it down to 1 annotation in the vast majority of cases. It would be great if this could just be "mu guards the following fields". This seems to require some inter-pocedural inference (but not inter-package). The algorithm for this inference is not yet settled. I suspect it would also need to be a bit opinionated: exported methods default to the annotation 'unlocked->unlocked' unless they are named Lock or Unlock() [or need an annotation otherwise? or unsupported?]. So there would be safe 'guarded by' patterns it would not support.

So some progress happened but big-ish, algorithm questions still remain for me. I am hoping to look into this more at some point. Maybe another internship?

g-talbot commented 1 year ago

On Tue, Sep 27, 2022 at 2:17 PM Tim King @.***> wrote:

We did the internship, and we managed to learn quite a bit. If you are interested in this topic, I strongly encourage you to try out checklocks https://github.com/google/gvisor/tree/master/tools/checklocks. We based the internship on modifying that code base.

I definitely will! Thanks for letting me know.

Here are a few of my big take-aways and opinions after doing this:

  1. the value of lock annotations seems consistent with other languages the experiments we tried.
  2. the lock annotation algorithms for other languages (like LLVM's) are not yet a perfect match for Go due to defer. The challenge is the pattern if ... { x.mu.Lock(); defer x.mu.Unlock(); _ = x.guarded ; } / code that does not touch mu or guarded /. This pattern is too common to ignore and should be supported. We worked on getting to a polynomial time algorithm and had some success with some kinda complex state representation. But I do not think we yet know a satisfying polynomial time algorithm for this that will not confuse tool users. Going out on a limb, I suspect the problems we saw is fixable by recording both the pre and post lock states of blocks so we can faithfully know when the result of a meet operations is not an equality, doing 'reversish abstract interpretation' from the returns to execute the defers, and only reporting a mismatch when the meet on a reverse post state has a disequal result to its inputs. I am speculating though. I have not tried this.

Yeah this is a tough one. I'd be tempted to consider this a potential bug in the program. This is somebody using defer like it was a C++ destructor, but they're missing a scope:

if ... { func(){ x.mu.Lock(); defer x.mu.Unlock(); _ = x.guarded; }() }

In some ways, this should be a Go-specific warning--"you used defer to unlock your mutex, but the unlock can happen much later than you intend, maybe wrap in func() { ... }()?". The issue here is that in the original scenario if ... { x.mu.Lock(); defer x.mu.Unlock(); _ = x.guarded ; } / code that does not touch mu or guarded / there could be some computationally intensive work in "code that does not touch mu or guarded", or an RPC, or some other blocking operation that maybe the author did not intend to be coupled to x.mu.

  1. a mutex-hat opt in annotations that mean "the field mu guards the immediately following fields in the struct" seems largely sufficient after our experiments. This annotation struck a good balance of feeling minimal while giving an opt-in signal. It can also be a struct tag.

Yeah this is pretty interesting.

  1. function lock annotations still felt a bit too verbose to feel "Go-like". The challenge we are setting for ourselves is whether we get it down to 1 annotation in the vast majority of cases. It would be great if this could just be "mu guards the following fields". This seems to require some inter-pocedural inference (but not inter-package). The algorithm for this inference is not yet settled. I suspect it would also need to be a bit opinionated: exported methods default to the annotation 'unlocked->unlocked' unless they are named Lock or Unlock() [or need an annotation otherwise? or unsupported?]. So there would be safe 'guarded by' patterns it would not support.

So some progress happened but big-ish, algorithm questions still remain for me. I am hoping to look into this more at some point. Maybe another internship?

Wow, thank you for sharing your thoughts! It's fascinating to hear this level of detail.

Take care,

--George

— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/46788#issuecomment-1259882356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ2F672JHEXOBGPBMVZZP5DWAM24BANCNFSM46Z76WQQ . You are receiving this because you authored the thread.Message ID: @.***>