AlexCouch / jewl-go

jewl-go is a go friendly performance analysis library
1 stars 1 forks source link

Recorder does not work with goroutines #9

Open AlexCouch opened 8 months ago

AlexCouch commented 8 months ago

Issue

In commit 6715e3a, there is a new test which runs the bubble sort as a set of goroutines instead of a sequence of function calls. It results in errors that does not occur in non-concurrent code. They seem to vary seemingly randomly meaning it may be a race condition in more than one area.

Solution

The solution may be to add a lock to the cache file and project file.

err := syscall.Flock(int(file.Fd()), syscall.LOCK_EX)
AlexCouch commented 8 months ago

While debugging the current issues with the current commit, I realized that the goroutines are causing a race condition with the cache. We need a way to distinguish between goroutines and non-goroutines. I'm thinking we could use a map of goroutine IDs to stacks which can be cached as a map. I found this stackoverflow which gives us a way to get the gid

AlexCouch commented 8 months ago

Please ignore the sudden closing. That was a mistake.

AlexCouch commented 8 months ago

Right now, the stack is all bugged out. Stop seems to be dysfunctional at the moment. Adding new frames does not seem to be the problem, but the new stack is not working right. An example of how bugged the new stack is:

julia> display(proj)
Jewl.Project(Dict{String, Array{Int64}}("github.com/alexcouch/jewl-go.TestFrame" => [0], "github.com/alexcouch/jewl-go.bubbleSort" => [0, 1, 2, 0, 0, 0, 0, 0, 0, 0]),...

In the proj when loaded by Jewl, it shows that the bubbleSort function is mapped to indices 0, 1, 2 and 0 a bunch of times. It's being called 10 times so it has 10 indies. However, when looking at the full proj output in julia, none of the frames have been stopped correctly. TestFrame's frame is a negative duration, and all the others have a 0 end and 0 duration. Also, the data have been added to the wrong frames. Some data which should be in the innerFrame or outerFrame, have been added to TestFrame.

This means that there is a bug in the way we get the top of the stack. I wonder if we are having issues with caching it. I changed the cache to json to make it easier to read. Also, Stop is definitely a place to look as well.