facebook / yoga

Yoga is an embeddable layout engine targeting web standards.
https://yogalayout.dev/
MIT License
17.25k stars 1.42k forks source link

0 layoutWidth and 0 layoutHeight after repeated layout calculation #1678

Closed michaeltroger closed 1 week ago

michaeltroger commented 3 months ago

Report

Issues and Steps to Reproduce

General setup:

Actual Yoga API usage making it reproducible:

  1. Injecting a YogaConfig (usually an optional argument) when creating the YogaNode
  2. Caching the created YogaNode instances
  3. Resetting the existing instances before re-applying the properties like width/height
  4. Calculating the layout
  5. Adding some artificial delay before repeating from 3

The issue happens already with a single YogaNode and setting only width/height (see also sample app), but the issue is not limited to this scenario. I have reasons to believe that it happens more often the bigger the tree is, but I thought a single YogaNode might make debugging the issue easier. If the tree is bigger then individual Nodes are affected but not necessarily all Nodes of the tree.

Expected Behavior

I would expect that consecutive calls with the same input always result in the same output.

Actual Behavior

After some iterations (repeating the resetting of the YogaNode, the applying of properties and the calculation of the layout), layoutWidth and layoutHeight deliver 0 values. It doesn't happen after a fixed amount of repetitions. It's mostly influenced through the delay between the calls and also the context in which the code is executed matters. E.g. I had the issue in the provided sample app after:

One needs to play around with these delays though. In a company internal Demo app it needs only 5 iterations with a delay of 5 seconds in between, while I see no issue with such delay in the provided sample app at all.

Coming to the actual production impact: I originally found the issue when rendering a layout with a root and 2 children boxes, each of them having another child box. After re-calculating the layout ~5 times, some boxes started to disappear and the debugging led me to the 0 values.

We were using Yoga 1.19.0 until now and the same code in Yoga 1 returns approximately after the same iterations, floating point numbers that are not entirely correct, but if rounded you wouldn't see any issue. For a size of 200x200, it returns 199.99997x199.99997 after 350 iterations in the sample app.

The issue with 0 values happens from Yoga 2.0.0 and still in 3.1.0.

Link to Code

A minimal reproducible Android app can be found here: https://github.com/michaeltroger/yogabug The app purposely crashes when the calculation failed. The crash can then be checked in the logs e.g.: java.lang.IllegalStateException: unexpected size: 0.0x0.0 in iteration 316

The most relevant code snippet (Kotlin language):

suspend fun calculate() {
   var i = 0
   val root: YogaNode = YogaNodeFactory.create(YogaConfigFactory.create())
   while(true) {
      root.reset()

      root.setWidth(200f)
      root.setHeight(200f)

      root.calculateLayout(1000f, 1000f)

      if (root.layoutWidth != 200f || root.layoutHeight != 200f) {
         error("unexpected size: ${root.layoutWidth}x${root.layoutHeight} in iteration $i")
      }
      delay(100.milliseconds)
      i++
   }   
}

see also: https://github.com/michaeltroger/yogabug/blob/main/app/src/main/java/com/michaeltroger/yogabug/MainActivity.kt

michaeltroger commented 2 months ago

Update: We dived a bit deeper and debugged the issue using the Yoga source code. We noticed that the issue happens inside PixelGrid.cpp -> roundLayoutResultsToPixelGrid. So in the final iteration (see example code above) - before roundLayoutResultsToPixelGrid, layoutWidth would be correct, but afterwards it would be 0. It turns out that in this specific iteration getPointScaleFactor() is not returning 1 anymore but 0.0f or -0.0f. Also getErrata() would suddenly return -1672651488 (previously 0 = Errata.None) and getVersion() 1 (instead of 0).

On first sight this didn't make sense since these values aren't set anywhere. Then we noticed that only a couple of iterations before this happens the YogaConfigJNIFinalizer::finalize is called from the JVM to garbage collect the Java part of the config.

So the issue can be avoided by making sure the Java config object is not garbage collected:

YogaNodeFactory.create(YogaConfigFactory.create()) // YogaConfig getting garbage collected after a while, leading to issue

val config = YogaConfigFactory.create() // defined in a scope where you can guarantee that the variable survives
YogaNodeFactory.create(config) // no issue in this case

YogaNodeFactory.create() // no issue since YogaConfigJNIFinalizer is not even instantiated in this case

Or in other words: The caller of the library must take care of keeping the Java config alive, since the Yoga library is not taking care of it. See also: https://github.com/facebook/yoga/blob/main/java/com/facebook/yoga/YogaNodeFactory.java#L16 https://github.com/facebook/yoga/blob/main/java/com/facebook/yoga/YogaNodeJNIFinalizer.java#L16 https://github.com/facebook/yoga/blob/main/java/com/facebook/yoga/YogaNodeJNIBase.java#L59 -> no reference to the Java Config object is kept anywhere.

Kudos to @rtPag for helping to find the issue!

rtPag commented 2 months ago

One option to make it safer for everyone using the public APIs would be to keep a reference to the config as a private member variable in the YogaNodeJNIBase class. This would guarantee that the config stays alive as long as the node is in use.

NickGerleman commented 2 months ago

Huge thank you @michaeltroger @rtPag 🎉

One option to make it safer for everyone using the public APIs would be to keep a reference to the config as a private member variable in the YogaNodeJNIBase class. This would guarantee that the config stays alive as long as the node is in use.

This seems like the correct solution to me, and is consistent with how the JNI bindings handle Yoga node garbage collection.

I think the production usages we have of JNI bindings all happened to use global duration configs, which is why we hadn’t seen it.

michaeltroger commented 1 week ago

Should be fixed by the attached commit