Open changkun opened 3 years ago
The problem seems more serious in all general cases.
If a user is manipulating a widget property in a separate goroutine, then nothing prevents the data race at the moment.
A more general solution should be considered:
Many of these relate to work already considered in #1028.
There was earlier work to expose a safe function for reading and writing properties but it was down-voted as not a good or useful API. I think we should revisit as you suggest and educate devs about the need (you can see some of this work in internal Widget.updateAndRefresh()
).
educate devs about the need
I am not sure educating devs can solve the problem entirely. Considering this example:
package main
import (
"image/color"
"log"
"time"
"fyne.io/fyne/v2"
"fyne.io/fyne/v2/app"
"fyne.io/fyne/v2/canvas"
"fyne.io/fyne/v2/container"
"fyne.io/fyne/v2/layout"
)
func main() {
win := app.New().NewWindow("Some Window")
w, h := 90, 25
cells := make([]fyne.CanvasObject, 0, w*h)
for i := 0; i < w*h; i++ {
cells = append(cells, canvas.NewRectangle(color.RGBA{R: 255, G: 255, B: 255, A: 128}))
}
container := container.New(layout.NewGridLayout(h), cells...)
colors := [3]color.RGBA{
{R: 20, G: 30, B: 100, A: 128},
{R: 100, G: 20, B: 30, A: 128},
{R: 30, G: 100, B: 20, A: 128},
}
go func() {
c := 0
tk := time.NewTicker(100 * time.Millisecond)
for range tk.C {
for i := 0; i < w; i++ {
for j := 0; j < h; j++ {
obj := container.Objects[i*h+j].(*canvas.Rectangle)
obj.FillColor = colors[c] // Data Race!
obj.Refresh()
}
}
log.Println("still refreshing")
c = (c + 1) % 3
}
}()
win.SetContent(container)
win.ShowAndRun()
}
If a user updates widget property in a separate goroutine, nothing really can prevent the data race. Event the user puts a lock to the object, the internal renderer will not corporate with the outsider locks. Hence, we cannot prevent the data race no matter what we do, can we?
I feel I was slightly misquoted there. I was commenting that it would require good documentation to educate developers about the need for proposed additional API. I never intended to claim that these changes were not needed.
From PR #2564, here is a summary of current observed problems:
To resolve all races, we need:
About that part of educating users of the API, we could add a tutorial over on developer.fyne.io with some well structured examples and useful information.
Historically we have wanted to allow setting of fields directly (for example a canvas.Rectangle
may want to update FillColor, StrokeColor and StrokeWidth all before refreshing).
Is there something we can do to support this still, or is it an unnecessary optimisation and we should tell everyone to use just setters (that we need to add, each of which locks, updates, unlocks, requests refresh).
I know that we have a few .SetFieldsAndRefresh()
functions in some internal parts of the code base. How about a .SetFields(func())
function then, that can lock (and defer an unlock) the mutex before calling the function to set the fields?
That is an approach I proposed in a previous PR, but it was down-voted. I just felt it needed to be part of the discussion as we approach this area again.
I have not yet obtained a global overview regarding all APIs and the potential of possible configurations. Each widget may have more than existing configurations, which I think have not yet been introduced. We might be more sensitive about this when trying to resolve the data races. Clearly, we don't want to end up with 100 customizable attributes using 200 setters and getters in total.
Inspired by Rob Pike's post. A possible approach might introduce an Options()
design:
package widget
type ButtonOpt func(b *Button)
func ButtonSize(w, h int) ButtonOpt { ... }
func ButtonColor(c color.Color) ButtonOpt { ... }
func (b *Button) Options(opts ...ButtonOpt) { ... }
On the user side, to update the configuration of a button, they might do:
b := widget.NewButton()
b.Options( // Update arbitrary number of configs
widget.ButtonSize(4, 2),
widget.ButtonColor(color.Black),
)
b.Refresh()
The previous example would be:
package main
import (
"image/color"
"log"
"time"
"fyne.io/fyne/v2"
"fyne.io/fyne/v2/app"
"fyne.io/fyne/v2/canvas"
"fyne.io/fyne/v2/container"
"fyne.io/fyne/v2/layout"
)
func main() {
win := app.New().NewWindow("Some Window")
w, h := 90, 25
cells := make([]fyne.CanvasObject, 0, w*h)
for i := 0; i < w*h; i++ {
cells = append(cells, canvas.NewRectangle(color.RGBA{R: 255, G: 255, B: 255, A: 128}))
}
container := container.New(layout.NewGridLayout(h), cells...)
colors := [3]color.RGBA{
{R: 20, G: 30, B: 100, A: 128},
{R: 100, G: 20, B: 30, A: 128},
{R: 30, G: 100, B: 20, A: 128},
}
go func() {
c := 0
tk := time.NewTicker(100 * time.Millisecond)
for range tk.C {
for i := 0; i < w; i++ {
for j := 0; j < h; j++ {
obj := container.Objects[i*h+j].(*canvas.Rectangle)
obj.Options(canvas.FillColor(colors[c])) // guarantee update safely
obj.Refresh()
}
}
log.Println("still refreshing")
c = (c + 1) % 3
}
}()
win.SetContent(container)
win.ShowAndRun()
}
That feels like quite a heavy API to replace what is quite simple field access. And it would mean a complete shift in API design so probably something more like 3.0 if required.
That feels like quite a heavy API to replace what is quite simple field access. And it would mean a complete shift in API design so probably something more like 3.0 if required.
Yes, I think that should be a replacement for all existing property APIs. More importantly, either way, to resolve the issue entirely we might have to break the existing usage because of the dilemma (https://github.com/fyne-io/fyne/issues/2509#issuecomment-939771233).
any updates on this issue ?
any updates on this issue ?
That seems like a strange question considering the last messages were 4 days ago and there are 2 open draft PRs also under discussion. Is there something specific you are looking for information on?
From PR #2564, here is a summary of current observed problems:
- Manipulating exported widgets struct can race with the internal renderer (if a user manipulates those fields in a separate goroutine, it is impossible to resolve this, we need a better way to offer the field exportation)
- Internal fields also suffer from the same issue.
To resolve all races, we need:
- carefully review all exported widgets and make every concurrent safe for all methods and properties by adding a Mutex (exported Lock and Unlock, but this will need to educate user or design the APIs in a way that handle the concurrency internally)
- Review rendering facilities and minimize the access of properties (if possible, do a copy).
@andydotxyz i am referring to @changkun 's reply, to be specific
@c1ngular I think we are still in the very early discussion stage. The problem is not as easy as JavaScript whereas Go has built-in concurrent nature. To converge to an optimal design we will take our time to practice more and think more about the whole picture.
Luckily, we have not yet observed any serious bug caused by the data race, the problem is rather theoretical rather than practical. Therefore I think if you are facing any serious bugs because of this, feel free to let us know and we could work together to find a solution that mitigates the problem in your case.
I took several further looks into the existing tests. Here is a summarized list of problematic tests that arise race error under race mode:
It turns out there is not much problem with the existing widgets and tests, considering #2510 is open for quite some time, I'd argue that we try to first fix those tests, and get race test enabled so that we can prevent future problematic code into the codebase. Then based on the practice, thinking and make a concrete future direction? Thoughts?
If my understanding is correct, I assume these data races will be serious problem when we need to implement a video player in version 3.x ? @changkun I always long for multimedia support . ^-^
Data races are always a problem, https://en.m.wikipedia.org/wiki/Race_condition and not specific to multimedia.
If my understanding is correct, I assume these data races will be serious problem when we need to implement a video player in version 3.x ? @changkun I always long for multimedia support . ^-^
I don't fully understand the argument. Can you show us a specific code example to further explain the argument?
frame drops/stutters/glitches while decoding/playing videos due to massive data races or too many locks ? just a general idea @changkun
Maybe, but imagination is not a strong argument.
We do need to resolve the data races. However I don't think that multimedia makes it any worse - nor do I think that you can see these in the GUI most of the time. Of course every one of them has the potential to have a bad ramification.
If you want to see multimedia working you could check out https://github.com/fyshos/movies.
A slightly ugly method that might be worth considering is what Goey does, where you can queue things to run on the main thread for after rendering.
A small and simple idea I got earlier today that would solve things for 3.0. We remove all API to change the state of a widget from the API (Text, Enable/Disable, ...) and only leave one function, Refresh. When Refresh is called, the next time they are about to be rendered a callback on the object is called and that callback should return the data that describe that widget. This remove all potential race condition and provide screen synchronization at the same time.
A possible example for Label:
type LabelState struct {
Text string
Alignment fyne.TextAlign // The alignment of the Text
Wrapping fyne.TextWrap // The wrapping of the Text
TextStyle fyne.TextStyle // The style of the label text
}
type Label struct {
Generator func() LabelState // Same name as Raster, but could be different obviously
}
func NewLabelStatic(state LabelState) *Label {
return &Label{
Generator: func() LabelState {
return state
},
}
}
func NewLabelDynamic(generator func() LabelState) *Label {
return &Label{
Generator: generator,
}
}
func (*Label) Refresh()
The generator callback should be extremely fast as it would otherwise impact the rendering of the frame and could potentially lock the rendering. For application that care about such problem, we could pass a context which would have been setup with a timeout and execute a certain number of those generator call in parallel using a goroutine pool.
I see a few advantages for this solution. We can add new property to the structure that describe the widget without any future compatibility issue. The callback is triggered only when it is about to render things on screen and Refresh can notify the rendering thread once, this would reduce unnecessary update request/processing. It match quite well the data binding API.
The main drawback is that for migration from 1.x/2.x to 3.x, the automated process we could write will generate ugly code that won't be perfectly thread safe as the current API is not (Basically all the current property will be set on a struct shared with Generator callback to set the returned structure). But I do not think that there could be an automated process that would generate a code that is compatible with 2.x and still thread safe, so this might be true for all possible solution.
Just realized we could most likely implement v2 on top of v3 and maybe make it possible to mix v3 and v2 code for easing migration.
A small and simple idea I got earlier today that would solve things for 3.0. We remove all API to change the state of a widget from the API (Text, Enable/Disable, ...) and only leave one function, Refresh. When Refresh is called, the next time they are about to be rendered a callback on the object is called and that callback should return the data that describe that widget. This remove all potential race condition and provide screen synchronization at the same time.
Sadly this removes all ability to do a bulk update and reduce refreshes. For example if I have 5 widgets with 3 property changes to be made I can do this and then call Refresh on the parent - all update in one invalidation. For performance we should continue to support this somehow.
Also i think a Generator
on every object could be confusing to users, especially as the object has already been instantiated.
Sadly this removes all ability to do a bulk update and reduce refreshes. For example if I have 5 widgets with 3 property changes to be made I can do this and then call Refresh on the parent - all update in one invalidation. For performance we should continue to support this somehow.
That could still work. The change you propose is that a Refresh called on the parent could/would trigger a Refresh on the children. This will only be the case for containers and widget. Should they always trigger a Refresh on all children? Or should the Layout function for containers or the Refresh of the widget be responsible to trigger Refresh on the children?
In any case, cascade update is actually possible and could be made more efficient than they are actually. If an object is already marked by a refresh flag that hasn't been cleared by the Rendering thread when it get the state, it doesn't need to trigger Refresh again. This mean not calling children Refresh more than once per window refresh.
I don't think this would be the source of any performance problem. Having a clear state definition, means the rendering thread can easily compare the value between the previous rendering state and the new rendering state. This should lead to more efficient and fine grained caching decision. Also we should have less costly Refresh overhead here, as the Refresh call doesn't need to queue anything in the Rendering thread, instead it need to mark things once per window refresh anything faster than what the system need/can keep up with will just be ignored. This should be better on lower end devices.
Also i think a
Generator
on every object could be confusing to users, especially as the object has already been instantiated.
The name could be different. I am bad at naming, but I can think of State(), Render() or Update() for example.
Throwing another idea of API, would be to have a different Update function with a callback that pass a pointer to the next state to render. Doing the same as the previous example:
type LabelState struct {
Text string
Alignment fyne.TextAlign // The alignment of the Text
Wrapping fyne.TextWrap // The wrapping of the Text
TextStyle fyne.TextStyle // The style of the label text
}
type Label struct {
rendered LabelState
rendering LabelState
toBeRendered LabelState
lock sync.Mutex
}
func NewLabel(state LabelState) *Label {
return &Label{toBeRendered: state }
}
func (l *Label) Update(update func (next *LabelState) error) error {
l.lock.Lock()
defer l.lock.Unlock()
err = update(l.toBeRendered)
if err != nil {
return err
}
Refresh(l) // That's where we schedule the widget for next update
}
This does not solve the synchronization problem, but the logic can be easily use for a new layout interface for canvas object like so:
type LayoutState struct {
size fyne.Size
position fyne.Position
}
func (l *Label) UpdateLayout(update func (layout *LayoutState) error) error
This would be the way for layout function to change object size/position without any race condition regarding the rest of the object data. It would also make it clearer that you are not supposed to modify the layout of the object outside of a layout function.
I am starting to think that an Update family function would actually be the right way forward. To address the synchronization problem, we could introduce:
func (l *Label) UpdateNext(func (toRender *LabelState) error) error
func (l *Label) UpdateNextOnce(func (toRender *LabelState) error) error
UpdateNext would queue call to be executed in the render goroutine when preparing the rendering of a frame (This could be used to build Animation on top of it). UpdateNextOnce would queue, but also cancel any other UpdateNextOnce that hasn't reach rendering too. This could be used to reduce over computation and synchronization problem.
Update would still be useful for the simpler inefficient/unsynchronized case and would be a way to implement Refresh actually making v2->v3 API compatible with an easy migration path.
It should be possible to resolve this without requiring that all widgets add new API. In part I am concerned that this pattern will conflict with the ability to extend, as we cannot replace a method with another using the same name but different parameters (labelState). Also however I don't think that whatever solution we find should add complexity (or API?) to the user facing elements, it would break the design of behaviour/intent based api.
I think it is impossible to resolve this without adding new API as there is nothing in the current API that express synchronization intent which is why we have race condition. The known pattern could be in the form of Lock/Unlock, Begin/End, sending state on a Channel or a callback as proposed with Update. There isn't that many pattern possible.
I had forgotten the problem with extending widget which add another layer of synchronization problem. Will have to think about how that pattern is going to work.
Extending widget and having an Update function with callback that provide different structure, seems to either call for interface{} and cast or a generic in later go version. Both are doable and fit the go ecosystem pattern, but they don't feel as easy to use.
Additional constraint I have forgotten about is that we might want to have a way to say that all this update will happen during the same frame. Not to sure how to do that bit, maybe it is enough to control the parent updating call, but I am not certain.
To solve the constraint of synchronizing update of widget during the same frame, I guess there is only really one way. The windows has to generate an event when the widget should start to change there state and all change that happen during that event would be taken into account.
Let's solve one at a time, at least from a design point of view. One of the reasons this is taking so long is because there are many facets I think.
Remember that we don't have to make it impossible to race, just that we should be able to eliminate the races and provide developers the same for their code... Lock/Unlock is indeed one way but we explored get/set and safe method wrappers in the past that seemed promising.
The problem with Get/Set is that there only boundary is the property that it manipulate. This means that you can not modify multiple property without racing with the renderer which could lead to non synchronized rendering. This is why I didn't list Get/Set as a solution, as it just hide race condition from tools, but do not solve visual artifact due to race condition.
Indeed, and for those cases we worked on the safe methods I indicated. such as SetFieldsAndRefresh(func())
which would allow you to pass in code that would execute under the lock, and finish by refreshing the widget.
Probably worth reading the history of PR #955 where I last tried to implement these helpers.
Update is very close to SetFieldsAndRefresh, except that the field of the structure are always visible in the case of SetFieldsAndRefresh and only visible within the scope of Update. The later improve safety, but increase complexity to handle extended widget. From reading the discussion, I wonder if opinion of the people involved have evolved as project in the go community are more heavily relying on function callback to do this things (case for all the DB interface).
I do have a big preference for the callback function as it could be used to implement Refresh today and offer a migration path which will be harder for Begin/End or Lock/Unlock, I think.
Also SetFieldsAndRefresh only cover one of the synchronization problem, it doesn't allow for synchronization with the render pace and we would still need something like UpdateNext/UpdateNextOnce to reduce over computing and synchronize things like animation.
Update is very close to SetFieldsAndRefresh, except that the field of the structure are always visible in the case of SetFieldsAndRefresh and only visible within the scope of Update.
This is quite a big difference - and as you say is indeed the main change. Taking this approach is basically a complete re-working of how widget state works so can't be done incrementally by fixing access in some cases. Your illustration also makes the constructor more complex which is undesirable. I don't want any APIs to be more complex in the solution that we progress with. And ideally people can opt in to it, or the migration is trivial...
I disagree on the complete reworking of how widget state works, as it is entirely possible to keep the current API and implement Refresh using Update. This would instantly give for existing application some level of thread safety without any change on their side (this will be only in the case that there is only one location and that the trigger to that call can only happen after Refresh call is done) and it gives an incremental upgrade path that could even be automated to some level (by locating all call to Refresh and looking for nearby change to Field in the public structure).
I am not sure what you mean the constructor being more complex.
If the root source of all data race problems is because of the switch to multithread rendering, is it a solution if we go back to drawing on the main thread? At least: 1) quickly solve the data race problems in today's API (the scope of this issue); 2) leave room for future API redesign (somewhere in the future but a different issue).
With GLFW if we go back to 1 thread then on many OS the windows will stop painting during resize :(
Is it necessary to paint during resizing?
I am not sure this solve any race condition as you would still be able to modify field in any canvas object from another goroutine that will race with the render thread even if that thread is the main thread and even if we force Refresh to be executed on main.
@Bluebugs If I parse your concern correctly, you are talking about this problem: https://github.com/fyne-io/fyne/issues/2509#issuecomment-939771233 where a user misuses the current APIs (non-concurrent safe)
If my memory works properly, we had a discussion somewhere in the mid of the year and concluded that resolving the data race includes two phases:
resolve internal data races (the scope of the original issue): this can be observed by building the cmd/fyne_demo
using go build -race
. Objective: we want our official demo application to run without any data race.
educate users to use API properly. Or this objective: design a new major version that allows APIs are able to call concurrently and avoid users misusing our APIs (somewhere in fyne/v3).
Does this clarify the scope of the problems we are facing?
I think that comment is close to my point, but I don't believe that calling a Refresh from a goroutine to be a misuses of the API. There is no other location you could change a property on an object if it comes from an event outside of Fyne. If we force everything from Fyne on the main thread, input, animation and object callbacks will be triggered on the main thread, we still have every interaction coming from outside of Fyne (like network request or timers for example) that will not be on the main thread. This can only be solved by having a different copy of the data for Fyne internal than what is exposed to the user (basically copy on Refresh).
Let me word this differently: Is it possible to avoid calling a Refresh (and rendering friends) from a goroutine ourselves to avoid producing internal rendering data races and the official demo?
Is it necessary to paint during resizing?
Well, I'd say yes - we implemented it following complaints that it didn't. Animations stopping during resize is not ideal.
Let me word this differently: Is it possible to avoid calling a Refresh (and rendering friends) from a goroutine ourselves to avoid producing internal rendering data races and the official demo?
This is not possible as far as I can tell... any widget setters will call refresh and if we insist that all of those are called on main we are no better than other toolkits and would even need to build more infrastructure to support it.
Oh unless you meant WidgetRenderer.Refresh
in which case we could do something about that.
I'd considered calling it internally on the render end of refreshing rather than the caller side... I had a local proof of concept for that and it did improve some potential loop and race issues in Refresh code.
Describe the bug:
Run fyne_demo with
-race
flag:go run -race main.go
OutputWe should enable data race detection in the tests.
Device (please complete the following information):