godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.15k stars 97 forks source link

Randomize resource IDs in scene files (GUID) when new resources are initially saved #471

Closed ignaloidas closed 3 years ago

ignaloidas commented 4 years ago

Describe the project you are working on:

Several small games with multiple developers.

Describe the problem or limitation you are having in your project:

Quite often several developers add something to a scene, and it is annoying to merge the changes since, for example, if an external resource was added in both patches, it would have the same resource ID, thus requiring to go through every mention of that ID and decide which one it should be.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:

We could randomize resource IDs in the tscn format. This would allow for easier merges, often fully automatic. The only problem could be when resource is deleted in one patch and used in the other one, but this can already happen.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:

Instead of setting increasing resource IDs when saving, set random ones for new resources.

If this enhancement will not be used often, can it be worked around with a few lines of script?:

This would be used often. Technically one could write a script that would randomize resource IDs before commiting, but that would be hacky and prone to bugs.

Is there a reason why this should be core and not an add-on in the asset library?:

Easier to do in core and affects most developers.

clayjohn commented 4 years ago

This wouldn't eliminate the issue though. Consider the issue where two developers both add an external resource to a scene and the RNG turns up the same number. Sure, it would be more rare, but the problem is still there. And in this case it would be more troublesome as developers would be relying on the randomized values.

ignaloidas commented 4 years ago

Using long enough numbers the chances of it happening would come down to nearly zero. We could additionally incorporate the time something was created, or the MAC of the machine like various UUID versions does.

Calinou commented 4 years ago

See also https://github.com/godotengine/godot/issues/15673 and https://github.com/godotengine/godot/issues/22226.

ignaloidas commented 4 years ago

godotengine/godot#22324 as well.

RabbitB commented 4 years ago

@ignaloidas Using 64-bit GUIDs, it's very likely that no Godot user anywhere would ever encounter a collision. If we wanted to be absolutely sure, we could do like MS and use a 128 bit GUID as is available in .NET.

At 128 bits, MS docs say the GUIDs are intended to be use as public identifiers across open networks and the entire web, and even then, across the entire internet, it's likely we'll see the heat-death of the universe before the same GUID was ever generated twice.

Wavesonics commented 4 years ago

@RabbitB I worked at a AAA studio w\ a very large game using hashes of object names as GUIDs, and they hit collisions now and then :P

That being said, I think it's still be a good solution. It'd make the chance very very small.

RabbitB commented 4 years ago

@Wavesonics Hashes are very different from pure GUIDs, although hashes are often used interchangeably.

Anytime you're taking a larger number of bits and hashing them to fit into a smaller number of bits, you're going to run into the possibility of collisions. GUIDs as generated by .NET aren't hashes, but pure random numbers that have no relation to the objects you want to refer to. They're generated independently from the data and simply used as identifiers.

sekhat commented 4 years ago

@RabbitB @Wavesonics Guid's also do have a chance of collisions. But every once in a blue moon is much better than every time two developers make changes to resources in a scene in different branches.

Wavesonics commented 4 years ago

@sekhat yes, totally agree. I would love to see this proposal implemented.

lucaspcamargo commented 4 years ago

This is a big problem for us at the studio I work in. Every other change, every other merge, becomes a painstaking process of looking into the text files and combing through the mess. It'd be great to see this implemented for 4.0 or even 3.x

Calinou commented 4 years ago

@lucaspcamargo Given how this change will most likely break compatibility, I doubt it can be implemented in 3.2.x.

LikeLakers2 commented 4 years ago

@Calinou I'm curious, how do you see it breaking compatibility? The way I understood IDs in TSCN files, it shouldn't matter what type of ID (or even what the ID is) a resource has while loading, just so long as the ID is the same everywhere that the resource is wanted.

lucaspcamargo commented 4 years ago

Yes, I thought of something along these lines: Imagine if the IDs began to be treated as hex numbers, internally stored in an int64_t. Since the IDs are local to the file, I can't see anything breaking unless I'm missing something. Of course, they would have to be treated as keys to a dictionary instead of indexes of an array.

cgbeutler commented 4 years ago

@Wavesonics for stuff done in parallel, it can help to use a GUID implementation that uses the machine name hashed with the creation time to randomize. This reduces collisions. Maybe that would be good here...?

AnomalRoil commented 4 years ago

This would be really useful for game jams where we work in a team in a very short period of time since conflicts arise all the time otherwise.

Edit: this is providing a use-case to motivate the proposal... Unless I'm missing why such proposals exists, it seems on-topic if you ask me...

Calinou commented 4 years ago

@AnomalRoil Please don't bump issues without contributing significant new information. Use the :+1: reaction button on the first post instead.

cgbeutler commented 3 years ago

After a quick and dirty look at things, this seems to be more of a Resource issue, not really scenes. I think the fix would involve rewriting the Resource serialization.

As far as I can tell, the main issue with doing a GUID is the fact that it wouldn't get persisted in RAM after the resources is loaded. That means if you loaded a resource, then changed something and re-saved it, you'd get all new GUIDs for everything. It would indeed remove issues with collisions, but things that didn't change could no longer match up their ids. To do a guid properly and maintain their values, you'd have to either persist those ids somewhere, which would bloat out all ram for resources, or re-read the file on save (gross.)

Doing something closer to a hash could maybe work better. Hashing sub_resources would almost never have issues. Hashing ext_resources, on the other hand, could pose an issue in large projects, as @Wavesonics pointed out. In those rare ext collisions, you'd be back to the original issue. That said, it would still be a world better than what it is now.

Anyway, food for thought. I could see maybe giving folks a choice of save type in the options. The reader would likely still be able to read any of the save types, as it just has to match up the ids.

chucklepie commented 3 years ago

I've found that even minor changes to a scene, e.g. changing a single property, completely re-arranges the sub resource ID ordering and you end up with all the same items in the tscn file, but in a different order and tracking changing for git is completely impossible.

I'm pretty sure this is not how it should be. As below, almost every line is changed causing havoc, but nothing was actually moved or touched in the scene in Godot, apart from, I think, one export/property of one node. It just rearranged every item to a different part of the file.

image

mrimvo commented 3 years ago

In my opinion, this is one of the major problems godot has. Collaboration with other people, branching (for hotfixes, features, releases), and merging is completely impossible because of this issue. We don't need a highly sophisticated Scene merger. We need unique ids instead of numbered ids. Apparently, this major problem of godot seems easy to fix.

Honestly, is there any reason not to include this in 3.2.4? Maybe we don't have the complete picture here. Is there any substantial reason not to use unique ids?

Calinou commented 3 years ago

Honestly, is there any reason not to include this in 3.2.4? Maybe we don't have the complete picture here. Is there any substantial reason not to use unique ids?

3.2.4 is nearing release. We can't risk making significant changes that can introduce regressions down the line.

Also, nobody opened a pull request to implement unique resource IDs yet :slightly_smiling_face:

Wavesonics commented 3 years ago

@Calinou that's totally understandable as 3.2.4 is in RC, but what is the thinking around 4.x or possibly 3.2.5?

This along with some of the other tickets you linked above, this does feel like a really impactful issue to me.

Calinou commented 3 years ago

that's totally understandable as 3.2.4 is in RC, but what is the thinking around 4.x or possibly 3.2.5?

I doubt this can be considered for 3.2.5 either, but maybe for 4.0. Still, nobody started working on this feature yet. Someone will have to do the work if you want this feature to be included in Godot.

mnn commented 3 years ago

Since I can't post to https://github.com/godotengine/godot/issues/22324 (Calinou closed it without fixing the problem, restricted comments and told me to go here), I'll write it here. That issue - "Editing scenes files (.tscn) in a git-friendly manner" - wasn't resolved at all and this issue doesn't fully cover the old one. Just randomization of ids won't solve the Godot's hostility to git, because also order of resources (subresources?) gets randomly changed (whole chunks of file gets rearranged for no reason), often without making any changes in the editor, just opening the scene and running a game is enough. Oh, there is https://github.com/godotengine/godot-proposals/issues/2342 "Limit re-arranging ordering and numbering of sub-resources in tscn files", wrongly marked as a duplicate of this issue... 😔

YuriSizov commented 3 years ago

Just randomization of ids won't solve the Godot's hostility to git, because also order of resources (subresources?) gets randomly changed (whole chunks of file gets rearranged for no reason)

Actually, randomization of IDs will fix that, because there will be no reason to arrange them in any particular order. Currently it tries to order them in an ascending order, and if for some reason dependencies get reindexed, not only does it change the IDs, it also changes the order. With GUIDs, or some other means for random uniqueness, nothing of this will be the case.

LikeLakers2 commented 3 years ago

@pycbouh But that introduces unnecessary noise, which makes it harder to review actual changes and bloats the download size of the repository.

Imagine if the same happened to the .h files in the godotengine/godot repository, with the various functions and variables getting reordered whenever someone makes a change. Wouldn't that be annoying to sort through?

YuriSizov commented 3 years ago

@LikeLakers2 Yes, current situation creates a lot of noise and unnecessary diffs, I also have to deal with that all the time. But that only happens because IDs are ordered integers, so the serialization process tries to put dependencies in their order. But if IDs are completely random, there will be no reason to ever rearrange them, therefore that problem will also be solved.

Did you perchance assume that IDs are randomized on every save? I don't think that this was ever proposed. IDs are randomized once, when the item is first added. Consequitive saves don't change IDs.

mnn commented 3 years ago

Actually, randomization of IDs will fix that, because there will be no reason to arrange them in any particular order. Currently it tries to order them in an ascending order, and if for some reason dependencies get reindexed, not only does it change the IDs, it also changes the order. With GUIDs, or some other means for random uniqueness, nothing of this will be the case.

I don't see it mentioned anywhere in this issue apart from the quote. It probably should be mentioned in the original post and title that it will solve the reordering problem as well. Thank you for clarification.

LikeLakers2 commented 3 years ago

@pycbouh ...Actually, I misunderstood what was being said. It wasn't that I assumed IDs would be randomly generated on every save, but that I misunderstood thinking they could still be reordered, even after your reply. My bad!

reduz commented 3 years ago

This is something I would like to work on past 4.0, right now internal resource IDs are numbers, but I would prefer they are string based (for which we could add a simple hash upon creation, like "Texture-azy4x"), on imported scenes (binary) they would also use an unique string when possible (as an example, when importing from Blender), as my goal is to be able to open sub-resources from files directly.

ignaloidas commented 3 years ago

Random numbers would help as a stop-gap as well, even if the file is still ordered, since the objects wouldn't get re-ordered.

Wavesonics commented 3 years ago

I spent some time working on this recently. With how things are structured currently the harder problem to solve is the resource IDs.

Calinou commented 3 years ago

I spent some time working on this recently. With how things are structured currently the harder problem to solve is the resource IDs.

@Wavesonics Could you push your code to a branch in your fork (even if you didn't finish it)? This could help someone else finish it :slightly_smiling_face:

Wavesonics commented 3 years ago

@Calinou I'm not sure it'll be much use: https://github.com/Wavesonics/godot/commit/3339abc509e381112ce36bab7fee9cb7f719cf7b

But let me explain what it looked like to me.

As far as I can tell, the concept of IDs only really exists in the text based file format. In memory and in the binary format, everything is just arrays, and either offsets into arrays, or pointers.

For the text based file format, the indices into those various arrays are used as the IDs, which is part of what leads to them being so volatile, since reordering the array will change many IDs.

It looks like the IDs are local to the scene, so that makes it a bit easier. For the nodes, I figured the path was necessarily unique within a given scene, so just for a proof of concept, I hashed that string to an int32. That had the added benefit of being able to generate the ID for a given node out of only that information. No reference to the actual node required.

And that actually did work. All of the Node IDs became stable.

The much larger problem was resource IDs.

They don't have paths or anything else reliable to hash like that, and suffer from the same problems with ID instability due to reordering the array they live in.

So that is where I hit a dead end with the hashing idea.

I thing it's probably not worth continuing down that path, as it has some other problems with it (hash collisions)

In my opinion, the right way to do this is to elevate the idea of an ID, both for nodes and resources, to be a real concept in the engine it's self. Then the ID will of course be in both file formats because it is an integral part of the node/resource classes.

Then we can have some sort of central service that allocates IDs, or what ever. Could be as simple as a project level int, and when you "allocate" a new ID, it just increments by 1 or something. That would have the nice benefit of keeping the IDs small and readable in the text format.

With my hashing idea you ended up with ints between -2 billion and +2 billion and some were quite ugly in text...