Fast Memory VertexData implementation

aram-ahak commented 6 years ago

We have implemented Domain Memory based fast memory utilization for vertex data. Results are VERY promising, and we would love to have all of the community pushing these changes forward. In our project we observe 20-30% rendering speed increase. The smoothness of the app also is observed by eye of user.

@PrimaryFeather we hope you will have a look very soon and will benchmark these changes against current master branch.

JohnBlackburne commented 6 years ago

What issues have you encountered in implementing this? I ask as I have looked into this a couple of times myself, and could see a couple of problems. First how do you decide how much Domain Memory to allocate? For any particular app it might be possible to decide on an allocation, but it would be hard to come up with a general approach. Second how do you deal with memory fragmentation? Again, this is something that might not be an issue for an individual app, but which is needed for a general solution.

gsario commented 6 years ago

@JohnBlackburne we allocate more memory if it gets filled. We have defragmentation logic integrated with allocation logic, this allows to allocate nearest vacant memory chunk. The solution works fine for our quite large app! and we have really good performance boost because of this. Now we need everyone to jump on this train, try & test, and work collaboratively on making this better.

JohnBlackburne commented 6 years ago

Looking more closely at the code it looks like it has a fixed size of 1MB, with no scope for increasing that if you run out of space. And it looks like it does nothing to counter fragmentation due to chunks that are too small to use, arising as a chunk is freed then re-used for something a little smaller, accumulating over time. These may be constraints you can live with but they are far less general than what is in Starling right now.

gsario commented 6 years ago

@JohnBlackburne yes, let's work on making this more general. 50% rendering performance increase is a great motivator. We can agree on one thing, that we can't expect such performance boost from Adobe.

PrimaryFeather commented 6 years ago

Hey Gevorg,

sorry for the late reply — I was on a vacation when you sent the pull request, and afterwords there were some other pressing matters to attend to. But today, I finally had a look at your code.

First of all, I think it's a very lightweight and elegant approach, given the limitations of the domain memory API. It's definitely a good start to see how much performance can be gained. Thanks a lot for your efforts and for sharing the code!

As John already commented, the difficulty will be to get this into a state where it runs reliably for all kinds of apps that are out there. E.g. I've heard from museum apps and even ATM terminals, often running for months. Depending on the number and type of allocations, and the time they've been running, we will sooner or later encounter problems.

This could be fixed with additional logic, but I'm not a big fan of adding that much additional code to Starling that's just doing things that ought to be fixed in the runtime (even though I'm aware that it's unlikely that Adobe will ever enhance ByteArray performance). I just don't see memory management as something that ought to be part of Starling.

I also find it strange that the avm2.intrinsics.memory package is acting so weird when using it via an IDE. The memory access methods (si32, li32, etc.) still create warnings when I use them from IntelliJ IDEA. Makes me wonder how often this is used / how stable this works in all kinds of environments. It's definitely sure to cause a lot of support questions when people get those errors after updating and recompiling Starling. The Travis build doesn't work either — very weird.

For all those reasons, I'd rather see this feature in a fork or at least an experimental branch people can opt into using. I'd be willing to make that branch part of the official repository if you rework the code slightly so that it follows Starling's standard source code conventions regarding whitespace and newlines. I'd also recommend moving the methods you added to the Pool class somewhere else to make the changes as separate from the standard code as possible (that simplifies merging upstream changes into that branch later).

On the other hand, keeping it in a separate fork for now would have the advantage that you'd keep full control over the source (e.g. can accept pull requests yourself, etc). You can still merge all the upstream Starling changes into your fork easily (that's not different than what I'd have to do in the branch inside the main repository). I could link to that fork from some Gamua pages so that it's easy for people to find it; and as soon as we think the code is ready for prime time and there is enough interest, I can still create a public branch for it here.

What's your opinion on this, Gevorg? (And anyone else reading this!)

gsario commented 6 years ago

@PrimaryFeather thanks for your reply!

We understands the concerns you have about moving this changes to the main branch. Having it as a fork is fine. Our goal is to put community behind this approach and make it even better.

So far about memory I must say that we are working on making the allocated memory size basically unlimited (as far as it is allocatable from OS). We are writing a lazy defragmentation logic for that. Currently we allow to up to 10MB memory allocation, with this change it will be as much as necessary.

No problem also about changing the code to your standards. We will work on that too. Meanwhile we will be happy to see more people behind this fork and its future improvements. I suggest to make a new threads to discuss this in the forum as well.

gsario commented 6 years ago

Hi! We just added changes with following fixes:

2 memory leaks fixed
memory defragmentation logic added allowing to free memory when allocated memory passes threshold of 10MB

Now we allow basically any available size memory allocation, without any memory waste by having defragmentation logic in place.

PrimaryFeather commented 6 years ago

Awesome! Thanks for your continuing work on this, and also for creating the thread in the forum!

Gamua / Starling-Framework

Fast Memory VertexData implementation #1028