godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.17k stars 98 forks source link

Facilitate implementing core data structure such as linked list via modules or plugins #1522

Open Xrayez opened 4 years ago

Xrayez commented 4 years ago

Describe the project you are working on:

Goost - Godot Engine extension.

Continuing from godotengine/godot#7194.

Describe the problem or limitation you are having in your project:**

Simply put, List != Array. Linked lists perform faster when it comes to insertion and deletion operations, having O(1) time complexity, not to mention the benefit of passing around list nodes throughout code while still maintaining the order of elements.

Note that this proposal also applies to any other core data structure which could be implemented via modules or plugins, but the linked list is one of the most commonly used data structures out there.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:**

For instance, I've previously ported bucket fill algorithms from GDScript to C++ because they were too slow to execute via script, and bucket fill typically requires a stack-based data structure (for which the List<T> is used throughout the engine internals). There was certainly a factor of going through GDScript calls for each pixel there as well which contributed to the slowdown, of course.

Other data structures could be more efficiently implemented/derived by using the List in a particular way, such as stack or queue. Any recursive calls could be more efficiently implemented using a list as a stack without reverting to using the GDScript call stack (which is also limited to 1024 by default). There are ways to use an Array as a stack as suggested in https://github.com/godotengine/godot-proposals/issues/1522#issuecomment-693697224, but the usage is advanced and requires more steps to setup and maintain.

Binary trees can also be implemented using nested linked lists.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:**

As a Reference

See goostengine/goost#12 where this is already implemented using C++ modules capabilities.

goost_linked_list_and_node

A list is implemented as a Reference and the nodes/elements inherit Object.

While working on such an implementation, I've stumbled upon various core issues:

As a Variant::LIST and/or Variant::LIST_NODE

I really don't know what has to be done to make it work, but I guess that would be the same as implementing Variant::ARRAY or Variant::DICTIONARY, so we'll have a third type of container type in Godot: Variant::LIST. But obviously, it would solve all of the above limitations because it would be treated as an actual core data structure.

If this enhancement will not be used often, can it be worked around with a few lines of script?:**

It is perfectly possible to implement such a structure via GDScript, certainly not a few lines of code but possible, see https://github.com/willnationsdev/godot-next/commit/72c0f7f72d097cbb27d313fbebf398d80d28e936 for instance. Yet there are existing issues which prevent implementing such a structure to work reliably via script (not mentioning the above limitations when trying to implement the same data structure in C++):

Is there a reason why this should be core and not an add-on in the asset library?:**

I have to be honest, with the particular Reference-based list implementation, there's no reason for this to be in core implemented like that. The engine does have List implemented in C++ for internal engine development throughout the codebase but despite this, it's really not trivial to expose the same data structure to scripting. It would be best if the list is implemented as a core Variant::LIST type to begin with, I've raised this proposal for discussion purposes, to share my research on this topic, and to indicate what has to be done in order to facilitate the development for modules and plugins.

I feel like goostengine/goost#12 could already solve a lot of problems, so feel free to suggest what could be done there as well, because it's not only about Godot Engine development, it's about solving more or less common needs as requested by people. πŸ™‚

In any case, I think there are enough of bullet points to solve in this proposal to make it easier for C++ modules developers to implement such a structure more reliably for everyone.

Zylann commented 4 years ago

bucket fill typically requires a stack-based data structure (for which the List is used throughout the engine internals)

Note, an array can be just as fast for this goal if capacity is properly implemented (and I would use one when I need a stack). What's less true however, is when you need a queue. I've been needing queues several times in GDScript and it's quite annoying to make one that has to constantly shift array elements or be fixed-size.

Xrayez commented 4 years ago

I've done some profiling with push back/front on both Array and LinkedList as implemented in goostengine/goost#12.

With 10000 elements:

goost-list-array-push-back-front

So yeah Array.push_front would be quite slow when used like that.

Also see comment made by @zwparchman https://github.com/goostengine/goost/pull/12#issuecomment-693662940 and useful article on vector/list performance.

Zylann commented 4 years ago

@Xrayez if you want to see my point with array, you should not recreate it from scratch every time (and test in release_debug). The point of capacity is to act as re-usable, contiguous memory pool so that the next push_back calls (after the first resize allocation) are equivalent to a size++, which is actually faster than allocating a list node. Besides, Godot's capacity handling is probably not as good as std::vector. Also, don't use push_front, really, that's another reason why your test is slow xD use push_back and index from the end with [-i]. Not saying List is a bad choice for stacks, but I'd still use arrays for stacks, essentially because when I have one, it is almost always re-used and iterated, so array wins.

Xrayez commented 4 years ago

I won't go deep into profiling because that's not my cup of tea (at the moment!), but if I understand you correctly, here are results (still recreating the types, but I haven't seen much difference):

20000 elements, debug:

test_array_push_back: 19 msec
test_array_push_front: 17880 msec
test_array_resize_set: 9 msec
test_list_push_back: 121 msec
test_list_push_front: 133 msec

20000 elements, release_debug:

test_array_push_back: 6 msec
test_array_push_front: 6465 msec
test_array_resize_set: 3 msec
test_list_push_back: 54 msec
test_list_push_front: 55 msec
func test_array_push_back():
    var array = Array()
    for i in ELEMENT_COUNT:
        array.push_back("Godot")

func test_array_push_front():
    var array = Array()
    for i in ELEMENT_COUNT:
        array.push_front("Godot")

func test_array_resize_set():
    var array = Array()
    array.resize(ELEMENT_COUNT)
    for i in ELEMENT_COUNT:
        array[i] = "Godot"

func test_list_push_back():
    var list = LinkedList.new()
    for i in ELEMENT_COUNT:
        list.push_back("Goost")

func test_list_push_front():
    var list = LinkedList.new()
    for i in ELEMENT_COUNT:
        list.push_front("Goost")

Also, don't use push_front, really, that's another reason why your test is slow

Yeah, but that's just to make some weight to the proposal. πŸ˜„

Regarding list, takes a while to construct a whopping Object for nodes indeed.

I guess I'd prefer to use a List for when I don't know exactly how much elements it shall contain, and probably gives better flexibility for procedural generation, for instance (but perhaps a Graph would be more useful for this task). πŸ˜ƒ

So, I think the implementation could be further adapted to accommodate flexibility over absolute performance.

I've stumbled upon godotengine/godot#41319 today and these kind of issues may justify making the list properties mutable, so you could write your own list implementation as well via script (such as circular list), but yeah again I guess wrap() could be used on arrays for similar functionality.

Xrayez commented 4 years ago

I have renamed the proposal to something which reflects my initial idea more accurately, I do not necessarily propose implementing an equivalent data structure in core, but there are quite a lot of issues I've stumbled upon while implementing this via C++ module which deserved a proposal, because most reported issues can only be resolved on the core level. Fixing most core issues can resolve this proposal (some of which I can do myself, and some already did), so I hope those issues are objective enough which make this proposal actionable.

A lot of those issues are agnostic to the proposal, and could help implement other data structures in C++ more easily, the most prominent and important out of all is probably godotengine/godot#42060, because it took me quite some time to figure this out on my own, and could help a lot of use cases, opening the door to implementing other more or less efficient data structures (performance is not priority for Godot development out of the box, so makes sense to satisfy performance needs for other parties).

Xrayez commented 2 years ago

There's another proposal for graphs #3848 which could greatly benefit from solving bullet-points presented in this proposal.

I'd say a graph could also be used as a linked list which could as well remove the need to implement LinkedList as a data structure in Godot, if implementing linked list is seen as a performance concern over Array (even if both data structures have their merit). However, see godotengine/godot#45455.

otoomey commented 2 years ago

If you really, really want to use a linked list, you can simple use a contiguous array and just store the index of the next node with the current item. So each element in your list is something like:

class Player:
    var name: String
    var next: int

All other graph structures can be implemented in similar fashion. The only thing you pay for with each lookup is the offset from the list pointer.

You might even get better performance this way over using a pure linked list because the data is all garuanteed to be in a similar (not as good a vector but still) portion of memory.

awardell commented 1 year ago

@otoomey Your approach is good for certain circumstances, but a lot of middle inserts or deletions will lead to severe fragmentation and cause the size of the array to grow unnecessarily large. Even a lot of front or end insertions and removals can lead to weirdness with determining whether to loop the array around or extend it.

I'm disappointed that this proposal hasn't received more traction lately. The Goost implementation of LinkedList which @Xrayez was working on seems to be completed, though not available for 4.X at least without some porting. Are the core issues enumerated in this proposal still causing problems for that implementation?

otoomey commented 1 year ago

@awardell Hmm I'm not sure if I completely agree. The order in the array does not have to match the order in your linked list. If you have a doubly linked list you can implement it in an array with O(1) insertion and deletion without fragmentation. When inserting, just push the element onto the end of the list and set the indices. When removing, swap the element to remove with the last element in the list and update the indices appropriately.

This way you also benefit from improved cache locality. A true linked list would rely on either godot or the OS memory allocator to keep fragmentation low. Therefore, even if I was writing a linked list in C I would implement it this way unless performance is not of concern.