Version 2.4 - Githubissues

Quillraven commented 1 year ago

This version contains some breaking changes which add some more flexibility to Fleks and also keep it up-to-date with other technologies. The most important change is the requirement of Java 11 instead of 8. This seems to be safe nowadays, even for Android. For LibGDX users, please use Gdx-Liftoff as your setup tool which already creates proper Java11 projects.

Changelog:

Several version updates:
- Kotlin 1.7.21 -> 1.9.0
- Dokka 1.7.20 -> 1.8.20
- Kotlinx-benchmark 0.4.5 -> 0.4.8
- Gradle 7.6 -> 8.2.1
world function got renamed to configureWorld. I got several comments on my YT videos that people got confused between world and World (constructor vs DSL function). To make it clear, it got now renamed to configureWorld
Also, the order of configureWorld functions no longer matters. Before, it was mandatory to call systems last. Now you are free to call injectables, systems, family in any order.
Kotlinx-serialization got added to Fleks. The @Serializable annotation is now part of the entity and makes it easier for certain serialization topics. This was suggested by the KorGE community.
A new EntityProvider interface got added which allows to provide your own entity ID logic. Per default it will still use the Fleks built-in recycle ID logic. However, you can now define an EntityProvider inside the configureWorld function to use your own implementation. This should improve the usability for network gaming.
We now have a new built setup with Gradle 8.x, thanks to @aSemy. The entire gradle setup is now more modern and easier to maintain. Also, Snapshot and LTS releases are now automatically published when the build is successful.
In addition to the gradle setup changes, Fleks now supports more native targets:
- linuxarm64
- macosarm64
- linuxx64
- macosx64
- mingwx64
ComponentHooks got replaced by normal component functions. The Component interface now has an onAddComponent and onRemoveComponent function which can be overridden to add custom functionality. Both functions run in the context of a World. This makes it easier to write such logic because you no longer need to add the ComponentHook to your world configuration. Thanks to @geist-2501 for the contribution!
documentation fixes for Entity.configure function (@aSemy)

For more information about the new functionality, please check out the updated wiki.

@jobe-m: as always, please let me know if everything is working in your KorGE example :)

@geist-2501: Shall we rename onAddComponent and onRemoveComponent to onAdd and onRemove? Since they are functions of a component, I think the Component part of the name is redundant. What do you think? I can add the change to this PR.

And btw, wiki is not yet updated ;) Will do that, when the version goes live.

Quillraven commented 1 year ago

Btw the publish workflow seems to get triggered even for PRs. Need to have a look at that but I think I already found a solution for it.

geist-2501 commented 1 year ago

@Quillraven yeah that sounds like a sensible rename!

jobe-m commented 1 year ago

@Quillraven Yes, I will. :)

Quillraven commented 1 year ago

@aSemy : any idea why "Publish" workflow gets triggered for PRs? I thought the if condition in the job should prevent that? Obviously, I am wrong :D

aSemy commented 1 year ago

@aSemy : any idea why "Publish" workflow gets triggered for PRs? I thought the if condition in the job should prevent that? Obviously, I am wrong :D

Ooh that's not very good! I don't know why that might be, I thought the condition would work too! I'll take a look...

EDIT: see #108

jobe-m commented 1 year ago

Ok, from my side. The new onAdd and onRemove component functions work for me :)

Currently in KorGE-Fleks I cannot use the data objects from Kotlin 1.9 since kproject sets up the KorGE project with 1.8.x. But that should not be a problem for Fleks. We just need to update KorGE'S kproject to support also 1.9.

Quillraven commented 1 year ago

Ok, from my side. The new onAdd and onRemove component functions work for me :)

Currently in KorGE-Fleks I cannot use the data objects from Kotlin 1.9 since kproject sets up the KorGE project with 1.8.x. But that should not be a problem for Fleks. We just need to update KorGE'S kproject to support also 1.9.

Great, thanks for checking! It is interesting to me, that you have problems with data objects. I will check that myself on the weekend in one of my projects. If this is not downwards compatible, shall we stick with 1.8.xx version for now?

jobe-m commented 1 year ago

data object was lately inroduced with Kotlin 1.9. Shouldn't be a big deal to update KorGE to 1.9. Soywiz is usually adapting new Kotlin versions quickly in KorGE.

But would be nice to be backwards compatible with Kotlin 1.8.xx just for some time. Maybe you can do a branch 2.4-kotlin1.8.xx after 2.4 is released which does not use data object. The rest you do not need to change since in KorGE-Fleks we use local gradle config to build Fleks.

Quillraven commented 1 year ago

@jobe-m: What kind of error are you actually getting? I just tried it in a project with Kotlin 1.8.22 and I can run it just fine.

jobe-m commented 1 year ago

I am getting The feature "data objects" is only available since language version 1.9. Are you sure that you build Fleks also with Kotlin 1.8 or did you only build the project with 1.8 but Fleks is build with 1.9?

Quillraven commented 1 year ago

I am getting The feature "data objects" is only available since language version 1.9. Are you sure that you build Fleks also with Kotlin 1.8 or did you only build the project with 1.8 but Fleks is build with 1.9?

I don't built Fleks at all. I just add it as a dependency to my projects. I use 2.4-SNAPSHOT from maven.

Building Fleks of course won't work with 1.8 but why do you have to build it?

jobe-m commented 1 year ago

With Soywiz's kproject we can add external modules as source delivery to a KorGE project. The benefit is that Fleks can be used on all fancy (homebrew) platforms which KorGE is available for. Thus Fleks is build with the build configuration of KorGE.

jobe-m commented 1 year ago

Please don't wait with 2.4 for me. I am currently unfortunately heavily loaded with private family stuff while trying also to survive at business. I won't have much time left the next month's. So my progress will be rather slow.

Quillraven commented 1 year ago

Ah understood - didn't know about the KorGE setup but then it makes sense. In this case, as you suggested, I will make a 2.4_8 branch with Kotlin 1.8 after the release.

And I totally understand the family and work stuff - I have similar problems at the moment ;)

Anyway, thanks for the info and updates! I will then release 2.4 this weekend.

tommyettinger commented 1 year ago

Thanks for recommending gdx-liftoff! Would you like Fleks added to the third-party extensions in Liftoff? I don't think you need to do anything.

I decided to run the benchmark with lower entity count and update count (each 1/10 of what it is now in the repo), and I got somewhat different results. Not having "fractionally complete" results (less than 1.0 throughput) for the complex benchmark helps both artemis and, at least a little, ashley. I ran this with Java 20 on a slightly newer laptop processor than the one used for the listed benchmarks, and Fleks' add and remove performance on here is vastly better than any competitor (225% more throughput than artemis-odb); I don't know why exactly. It's a tiny bit slower on both simple and complex with the smaller entity count, which suggests there might be an algorithmic complexity difference that makes fleks relatively faster on larger entity collections. Artemis is quite a bit faster on the complex benchmark with the smaller entity count (68% more throughput than fleks).

$ C:\d\jvm\jdk20-hotspot\bin\java -jar Fleks-jvmBenchmarks-jmh-2.5-SNAPSHOT-JMH.jar -w 10 -r 10 -wi 10 -i 10 -t 1 -f 1 -gc true
...
Benchmark                    Mode  Cnt      Score     Error  Units
ArtemisBenchmark.addRemove  thrpt   10   4071.356 ┬▒ 337.725  ops/s
ArtemisBenchmark.complex    thrpt   10    178.795 ┬▒   3.415  ops/s
ArtemisBenchmark.simple     thrpt   10   9047.583 ┬▒  62.516  ops/s
AshleyBenchmark.addRemove   thrpt   10   2269.431 ┬▒  42.238  ops/s
AshleyBenchmark.complex     thrpt   10     16.279 ┬▒   0.199  ops/s
AshleyBenchmark.simple      thrpt   10   2086.183 ┬▒  68.274  ops/s
FleksBenchmark.addRemove    thrpt   10  13264.218 ┬▒  34.791  ops/s
FleksBenchmark.complex      thrpt   10    106.155 ┬▒   2.879  ops/s
FleksBenchmark.simple       thrpt   10   7433.124 ┬▒  64.739  ops/s

This only shows that Artemis is faster than fleks for the simple and complex cases with 100 entities; it isn't ever faster on addRemove that I can see. Whatever you're doing for addRemove is great, and if there's a way to do something similar with what simple and complex use, they would probably outpace Artemis in more cases.

I initially ran the benchmark because I wanted to make sure the methodology was sound, but the benchmark code looks solid. Benchmarking is such a tricky thing to get right because there are so many factors involved...

EDIT: I also ran the benchmarks with Shenandoah GC enabled, to see if garbage collection might affect things.

$ C:\d\jvm\jdk20-hotspot\bin\java -XX:+UseShenandoahGC -jar Fleks-jvmBenchmarks-jmh-2.5-SNAPSHOT-JMH.jar -w 10 -r 10 -wi 10 -i 10 -t 1 -f 1 -gc true
...
Benchmark                    Mode  Cnt      Score     Error  Units
ArtemisBenchmark.addRemove  thrpt   10   3660.630 ┬▒ 596.098  ops/s
ArtemisBenchmark.complex    thrpt   10    124.183 ┬▒   1.215  ops/s
ArtemisBenchmark.simple     thrpt   10   8220.601 ┬▒ 240.047  ops/s
AshleyBenchmark.addRemove   thrpt   10   1770.910 ┬▒   8.321  ops/s
AshleyBenchmark.complex     thrpt   10     14.689 ┬▒   0.122  ops/s
AshleyBenchmark.simple      thrpt   10   2154.914 ┬▒  61.261  ops/s
FleksBenchmark.addRemove    thrpt   10  11708.724 ┬▒ 107.808  ops/s
FleksBenchmark.complex      thrpt   10    101.074 ┬▒   0.912  ops/s
FleksBenchmark.simple       thrpt   10   6607.135 ┬▒  48.420  ops/s

Shenandoah GC seems to hurt performance here, which surprised me.

Quillraven commented 1 year ago

@tommyettinger: funny coincidence because I wanted to open an issue soon for gdx-liftoff and ask, if you could add this library. To answer your first question: yes please, would be great if it can be added directly :)

Regarding performance: I am personally confused by the performance measurement of JVM languages. On my notebook (AMD processor) Fleks is always faster in addRemove and simple benchmark and in the complex one it is close. On my PC (old Intel CPU), which was used for the benchmarks in the README.md Fleks is still faster with addRemove but in simple benchmark it is random. Sometimes faster, sometimes slower.

I don't get it 100% but in the end I didn't care that much to investigate and understand it. Important for me was, that it is faster than Ashley and relatively close to Artemis.

To give a little bit of insight:

The result in the README is quite old from a version before we moved everything to Kotlin multiplatform. It was the JVM version. There is a 1.6-JVM branch still available.
I think addRemove is faster compared to Artemis because it doesn't use reflection anymore. If I remember correctly, Artemis uses Reflection to create a new instance of a component and afterwards modifies it. In Fleks this is no longer necessary. You directly create the component without Reflection. I guess this makes it faster and easier to optimize for the compiler magic.
For the complex benchmark I cannot remember anymore if Artemis delays an entity removal/modification until the end of the frame or until the end of the system update. Fleks does the calculations always at the end of a system update. I'd need to check if the components of the entities have the same values at the end of the benchmark in both Artemis and Fleks. If Artemis does it only once per frame then this could explain why it is faster. However, I would still not change the behavior in Fleks because it makes more sense imo to make an update of families as fast as possible, which is at the end of a system update. Actually, this is not the fastest way and still includes a delay until the end of the system, but this is necessary because otherwise you get very strange scenarios in your ECS. I can elaborate on that if needed.
Fleks uses Kotlin compared to Java in Artemis. Therefore, it automatically includes the null-safety. I can imagine that the bytecode below adds some additional checks automatically which of course has an impact in performance. Most likely not a huge impact but still. I am also not sure if the kotlin multiplatform stuff has an impact here since I am not 100% sure what it does exactly to make Kotlin "automagically" compatible with multiple platforms.
Imo Fleks is now "feature complete" and all the features that people requested until now are included. I use it in most of my private projects and I don't miss any functionality at the moment. That's why I want to focus again on possible performance improvements for the upcoming versions. I have some ideas but I don't know when I will have the time to work on them. Some ideas:
- From what I remember, the BitArray implementation is the bottleneck of the complex benchmark. It is used in families to check if an entity is part of it or not. And it is also used in families for the entities itself, where it gets converted to a bag for faster iteration.
- My idea is that maybe a second implementation makes sense which can be optimized for the family contains function check. Since that data is based on components, which have a fixed size, it might be possible to speed that up.
- Otherwise, I read about an Archetype approach in ECS where you can basically skip the entire contains function of the family. In that approach, if I understood it correctly, you use more memory because a component gets added multiple times to different archetypes it belongs, but it has the performance advantage since the expensive contains checking is no longer necessary and also the conversion from BitArray to Bag is no longer needed. This approach however requires a bigger change in the library, I guess BUT it will be a lot faster with the downside of using more memory.
- github also recommended me an ECS library called Dominon which claims to be ultra fast. I looked a little into the sourcecode and it uses Unsafe calls, which are not possible in Kotlin multiplatform afaik. I think it also uses the Archetype approach. I personally don't like the syntax of that library but it is for sure worth it to check why it is so fast (if the claims are true ;) ).

tommyettinger commented 1 year ago

I can run some more benchmarks later tonight, including ones that take hours if needed. I think the random-seeming aspect of the benchmarks has a lot to do with the OS, any existing processes running on the machine, how much time the JIT compiler has had to get things ready, etc. It's best to reduce the amount of error by running more iterations and for longer (I used 10 seconds, ideally I think I would try a minute per iteration when using many entities). I suspect GC may be a factor here because switching to Shenandoah had a (negative) performance effect, though it seemed to hurt all the benchmarks.

BitArrays are amazing data structures and I love them. 😄 If you aren't currently using bitwise operations to do steps in bulk, that would be something to consider, though I imagine you probably are doing things like masking heavily already. I suspect the conversion to a bag would be the main slow point there, or any GC that needs to be done to clean up temporary bags or bit arrays. It might be worth it to consider iterating while still in a BitArray -- if each block of bits is stored as a Long, at least on the JVM there are some very fast intrinsic candidates to work with that Long and its bits. If you want to get the lowest bit set in num, that doesn't even need a method call; it's just num and -num (I'm not 100% sure of the Kotlin syntax here, it's a bitwise AND operator).

Quillraven commented 1 year ago

Would be interesting to find out the code segments that take up the most time in simple/complex benchmarks and go from there. Not sure if your way of benchmarking can do that. The latest IntelliJ version contains benchmark inlay infos but I don't have experience with them yet. But might be useful as well ;)

My assumption is that following methods are the bottlenecks (family.kt and bitArray.kt):

/**
     * Returns true if the specified [compMask] matches the family's component configuration.
     *
     * @param compMask the component configuration of an [entity][Entity].
     */
    internal operator fun contains(compMask: BitArray): Boolean {
        return (allOf == null || compMask.contains(allOf))
            && (noneOf == null || !compMask.intersects(noneOf))
            && (anyOf == null || compMask.intersects(anyOf))
    }

    /**
     * Updates this family if needed and runs the given [action] for all [entities][Entity].
     *
     * **Important note**: There is a potential risk when iterating over entities and one of those entities
     * gets removed. Removing the entity immediately and cleaning up its components could
     * cause problems because if you access a component which is mandatory for the family, you will get
     * a FleksNoSuchComponentException. To avoid that you could check if an entity really has the component
     * before accessing it but that is redundant in context of a family.
     *
     * To avoid these kinds of issues, entity removals are delayed until the end of the iteration. This also means
     * that a removed entity of this family will still be part of the [action] for the current iteration.
     */
    inline fun forEach(crossinline action: Family.(Entity) -> Unit) {
        // Access entities before 'forEach' call to properly update them.
        // Check mutableEntities getter for more details.
        val entitiesForIteration = mutableEntities

        if (!entityService.delayRemoval) {
            entityService.delayRemoval = true
            isIterating = true
            entitiesForIteration.forEach { action(it) }
            isIterating = false
            entityService.cleanupDelays()
        } else {
            isIterating = true
            entitiesForIteration.forEach { this.action(it) }
            isIterating = false
        }
    }

// --> which also includes a call to

// This bag is added in addition to the BitArray for better iteration performance.
  @PublishedApi
  internal val mutableEntities = MutableEntityBag()
      get() {
          if (isDirty && !isIterating) {
              // no iteration in process -> update entities if necessary
              isDirty = false
              entityBits.toEntityBag(field)
          }
          return field
      }

// bitArray.kt

 fun intersects(other: BitArray): Boolean {
        val otherBits = other.bits
        val start = min(bits.size, otherBits.size) - 1
        for (i in start downTo 0) {
            if ((bits[i] and otherBits[i]) != 0L) {
                return true
            }
        }
        return false
    }

    fun contains(other: BitArray): Boolean {
        val otherBits = other.bits

        // check if other BitArray is larger and if there is any of those bits set
        for (i in bits.size until otherBits.size) {
            if (otherBits[i] != 0L) {
                return false
            }
        }

        // check overlapping bits
        val start = min(bits.size, otherBits.size) - 1
        for (i in start downTo 0) {
            if ((bits[i] and otherBits[i]) != otherBits[i]) {
                return false
            }
        }

        return true
    }

    /**
     * Returns the logical size of the [BitArray] which is equal to the highest index of the
     * bit that is set.
     *
     * Returns zero if the [BitArray] is empty.
     */
    fun length(): Int {
        for (word in bits.size - 1 downTo 0) {
            val bitsAtWord = bits[word]
            if (bitsAtWord != 0L) {
                val w = word shl 6

                for (bit in 63 downTo 0) {
                    if ((bitsAtWord and (1L shl bit)) != 0L) {
                        return w + bit + 1
                    }
                }
            }
        }
        return 0
    }

    /**
     * Returns number of bits that are set in the [BitArray].
     */
    fun numBits(): Int {
        var sum = 0
        for (word in bits.size - 1 downTo 0) {
            val bitsAtWord = bits[word]
            if (bitsAtWord != 0L) {
                for (bit in 63 downTo 0) {
                    if ((bitsAtWord and (1L shl bit)) != 0L) {
                        sum++
                    }
                }
            }
        }
        return sum
    }

    inline fun forEachSetBit(action: (Int) -> Unit) {
        // it is important that we go from right to left because
        // otherwise some code in toEntityBag will fail with ensureCapacity.
        for (word in bits.size - 1 downTo 0) {
            val bitsAtWord = bits[word]
            if (bitsAtWord != 0L) {
                val w = word shl 6

                for (bit in 63 downTo 0) {
                    if ((bitsAtWord and (1L shl bit)) != 0L) {
                        action(w + bit)
                    }
                }
            }
        }
    }

    fun toEntityBag(bag: MutableEntityBag) {
        var checkSize = true
        bag.clear()
        forEachSetBit { idx ->
            if (checkSize) {
                checkSize = false
                // this is working because forEachSetBit goes
                // from right to left, so idx is the highest index here
                bag.ensureCapacity(idx)
            }
            bag += Entity(idx)
        }
    }

I had an idea of caching the result of the contains call inside a family but that requires a fast hashCode function of the BitArray. The current version is pretty slow and the cache then slows down the performance :D At least in a quick&dirt try-out that I had.

tommyettinger commented 1 year ago

There's probably a few places in BitArray that could be sped up. The bit counting method should probably use https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/count-one-bits.html , since that is more likely to use the JVM intrinsic when available or the popcnt instruction in some way from native code. Most of the per-bit loops currently do 64 iterations regardless of how many bits are set, so for sparse BitArrays they may check their inner conditional far more times than necessary. I'm not especially familiar with idiomatic Kotlin style, but I'd suggest looking for existing high-performance BitSet/BitArray implementations in Kotlin, and seeing how they do it in the source code. The hashCode() question is an interesting one; contains() seems to be implemented quite well already, so optimizing it could be a challenge (as you found).

I think I'll tinker with bitArray.kt some, and submit a PR if I see any performance gains.

Quillraven / Fleks

Version 2.4 #106