Add support for texture streaming

Akira1San commented 3 years ago

Describe the project you are working on

A horror game with a little bit of open world.

Describe the problem or limitation you are having in your project

Not really a problem, but optimization is the key in here.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Texture streaming is a very common feature in 3D engines today, and for good reasons. Texture streaming can provide some clear advantages for games which have massive amounts of textures to deal with. The two most well-known advantages of streaming are:

A streaming system can automatically keep only the necessary textures in memory to minimize the minimum VRAM requirements of any given scene, while still being able to utilize any left over VRAM as a cache by only unloading textures when necessary. Reduced load times since the game can be played despite that the textures are still being loaded in.

(text description taken from https://jvm-gaming.org/t/tutorial-stutter-free-texture-streaming-with-lwjgl/47661)

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Im guessing that the user will change the texture properties from its setting to an texture stream and its size

If this enhancement will not be used often, can it be worked around with a few lines of script?

It will be used like 60-90% of game projects.

Is there a reason why this should be core and not an add-on in the asset library?

Its Core!

Calinou commented 3 years ago

The current plan is to add a special kind of texture that can be streamed, but it will not be usable everywhere Texture2D is for technical reasons. This means its use will be constrained to 3D rendering (and may not be usable for texture arrays).

Due to time constraints, texture streaming will have to wait for a future 4.x release and won't be in 4.0.

unfa commented 1 year ago

I am wondering if JPEG XL wouldn't be an excellent format for texture streaming - it supports tiling and progressive streaming, which I think would be ideal for large splatmaps for open world games. Also for progressively streaming higher res textures for models as they come closer to camera etc.

It could also be used to store regular textures and allow loading them at lower resolutions to save GPU memory on underspeced machines. Right now the only way to use smaller textures is by supplying them via a package (argh) or loading full res and using MIPs (aaaarghhh) :D

Existing JPEG images can be losslessly recompressed into JXL, improving compression ratio at no quality loss as well. I am not sure about decompression speed, memory use etc but I assume they are optimized for that as a delivery format that is supposed to work on anything.

Format overview: https://jpeg.org/jpegxl/

Reference implementation (C++, BSD-clause 3 license) https://github.com/libjxl/libjxl

There's also JPEG XR that seems even more flexible, worth evaluating I guess. https://jpeg.org/jpegxr/index.html

K0bin commented 1 year ago

@unfa You want to store your textures in a compressed format that GPUs can natively work with. So BCn on PC and ASTC on mobile. I don't think JPEG XL supports those.

unfa commented 1 year ago

Ah, I understand. Thanks!

Koalamana9 commented 1 year ago

More than 2 years have already passed since this proposal and no work has even been undertaken.

I would like to remind everyone that since Unity shenanigans, donations to Godot doubled in size and is now receiving more than 50k euros per month, and this is not to mention individual donors and additional funding from W4 Games.

Are you okay with the management there? 2 YEARS Annotation 2023-11-22 150833

I understand that now they will write that proposals are accepted for consideration if there is a demand for it, but texture streaming is a fundamental system necessary for managing the most memory hungry resources - textures, there is not much demand for this proposal because the majority simply do not even understand how important it is and how it would improve the performance of their games if they finally have the ability to unload large textures dynamically in real time.

AThousandShips commented 1 year ago

Anyone able to work on this is free to do so 🙂, if no one is taking this on how do you propose we do it?

WrobotGames commented 1 year ago

We can't blame the godot team on not implementing this feature in the past 2 years because they were busy creating godot 4.0. I, however, also feel like this is an essential feature for scalability in 3d games.

Clay John had a few slides in his presentation ("The future of rendering in Godot") about Asset streaming at Godot Con 2023. So this feature hasn't gone under the radar!

Koalamana9 commented 1 year ago

Anyone able to work on this is free to do so 🙂, if no one is taking this on how do you propose we do it?

Hoping for years that someone will suddenly start working on it for free? I specifically included a screenshot of their monthly donations, they have the resources to hire maintainers to solve the two-year proposals that the engine needs.

AThousandShips commented 1 year ago

Just throwing money at a problem isn't a sustainable solution...

There are many areas that require special focus and resources, and expenses that already exist, and this project relies almost entirely on volunteer work as any open source project like this does

Koalamana9 commented 1 year ago

Just throwing money at a problem isn't a sustainable solution...

the phrase sounds nice, but in this case it would have actually worked

AThousandShips commented 1 year ago

but in this case it would have actually worked

And many other problems as well, like compiled GDScript to obfuscate output, further platform support, physics bugs, etc., etc.

How fast would the margins be spent?

In any case all of that is off topic, let's stay on topic 🙂

Koalamana9 commented 1 year ago

and this project relies almost entirely on volunteer work as any open source project like this does

Now I have a real question: what do they even spend 50k a month on?

AThousandShips commented 1 year ago

I'd suggest looking elsewhere for that, not on topic for this proposal, please stay on topic, you can check the foundation webpage and other sources

This off topic distraction does nothing to help this proposal get implemented

Edit: you phrased it far better than I could, leave it here as a closing remark on the discussion

unfa commented 1 year ago

EDIT: I'm sorry for another off topic comment here, I can delete or move elsewhere if need be.

Note that Godot was in pretty bad financial state before Unity has hit the fan.

Because there's simply not enough hands on board to take care of everything at once. Gathering funds for Godot and starting up W4 games, and releasing Godot 4 and preparing GDC presentations and GodotCon... all of that is a lot to manage in a project.

Sure, maybe it's possible to manage the work better - I don't know how to do that, nor I have resources to help with it - maybe you do?

Also - hiring developers is not just a "throw money at it" problem. You need to hire the right people, ensure they will be comfortable working on whatever you have for them, you need to make sure they have things to do within their expertise and you need to onboard them I to the codebase, teach the coding style, introduce to other developers... It's not like clicking on an icon in an RTS to make more people build the thing faster...

At the same time there's a lot of things being worked on, and a lot of community work to manage.

If you ask a 100 Godot users, you'll get a 100 different answers to the question "what should the Godot team focus on next?". Game engines are some among the most complex and multidisciplinary pieces of software out there - there's an insane number of moving parts, and Godot has work being done on pretty much all of them - but that can't happen all at once.

Context switching between 50 tasks every day is gonna have a paralyzing overhead for a developer too, so they need time to focus on a small number of things and finish these up before taking on more.

Be patient, and remember that anybody can contribute!

PS: Reminds me of this joke:

A manager is someone who believes 9 women can deliver a baby in 1 month

SlashScreen commented 1 year ago

other than lack of manpower, what is the obstacle to this being implemented?

clayjohn commented 1 year ago

other than lack of manpower, what is the obstacle to this being implemented?

There is no obstacle other than lack of time. The people who are willing to work on this are busy with other tasks right now.

edit: the author of Wicked Engine provided a great breakdown of how he implemented texture streaming, a lot of the details should work well in Godot too https://wickedengine.net/2024/06/texture-streaming/

SlashScreen commented 7 months ago

Is this being covered by The Forge's improvements?

Calinou commented 7 months ago

Is this being covered by The Forge's improvements?

No, the collaboration is completed. Everything that could be submitted has been submitted already.

Support for texture streaming is still planned, but not for 4.3.

SlashScreen commented 7 months ago

Phooey. Well, I'll be patient then, since I don't know Vulkan well enough to provide any assistance.

jams3223 commented 4 months ago

BC7 Texture Compression with Sparse Virtual Texture aka Megatexture is the way to go and is the AAA route.

jams3223 commented 4 months ago

Sparse Virtual Texture Introduction

In a sprawling, open-world video game, the usual practice involves loading one or more physical textures per game object into the memory, or VRAM, and binding them all before a draw call. This process results in additional overhead because it requires retrieving a large amount of data from memory to the graphics processing unit (GPU), which in turn leads to significant VRAM usage.

Virtual Textures seeks to resolve this challenge by constructing a large virtual texture memory allocation that contains data for the entire world on a disk drive. The use of pagination divides the virtual texture into small chunks called tiles (pages), loading only the essential textures into physical memory and unloading those that are not required. The CPU creates a virtual address for the virtual texture memory allocation and translates it into a physical address for the physical memory. This process involves mapping the virtual address to a corresponding physical address through a page table structure. The number of tiles (pages) allocated to the virtual memory needs to match the number of tiles (pages) required by the physical memory. Sparse Volume Texture

We store the texture in virtual memory and divide it into several sections known as tiles. We organize and identify these tiles, or pages, with white lines. The CPU sends the required tiles divided into blocks, indicated by a red box, along with their virtual addresses, through a page table, then translates these virtual addresses into physical addresses to fill the physical memory, or VRAM, and begins the loading process.

When starting the process, we need to map the number of tiles (pages) linearly to the number of entries in the page table. For example, if we have 64 tiles (pages), we need to map them to 64 entries in the page table for translation. We can have textures with a dimension of 64x64 per tile, which also matches BC7 compression.

CPU mechanism: Consider a scenario where the total memory address is 13680. The CPU divides this memory address into a page-aligned value and a remainder. Subsequently, the CPU divides the aligned value by the page size to obtain the index from the page table. The CPU proceeds to access the page table entry at this index for the updated aligned address, and then combines the remainder with the aligned address to yield the physical address sum.

"Acyclic Graph" has the potential to elevate the efficiency of translation.

jams3223 commented 4 months ago

I came across some online examples of people putting it into action, so if that would be useful, I'm here to support anyone who's willing to give it a shot.

octanejohn commented 4 months ago

this looks like the virtual shadow map paper, maybe too demanding for mobile related https://ktstephano.github.io/rendering/stratusgfx/svsm

jams3223 commented 4 months ago

this looks like the virtual shadow map paper, maybe too demanding for mobile related https://ktstephano.github.io/rendering/stratusgfx/svsm

Thanks I tought I uploaded the link.

clayjohn commented 3 months ago

I realized recently that Juan wrote up a technical proposal last year and it hasn't been shared widely yet. So here is the text of his proposal for reference.

Texture Streaming

Texture streaming is a strong requirement for loading large game production scenes. Opening large game scenes in Godot without this would take forever and risk running out of memory since most of the high quality content nowadays relies on this being available in game engines.

There are several ways to implement texture streaming. Vulkan supports the sparse textures extension, but it is known to not be well performant on PC.

Pool

The most common and straightforward way to implement texture streaming is with a persistent pool with various texture array sizes. As an example, a pool could exists as this collection of texture arrays, compressed as either DXT5 or BC7 (depending on settings):

Texture Arrays:

Array length: 16384 - Size: 128² - 256mb
Array length: 4096 - Size: 256² - 256mb
Array length: 1024 - Size: 512² - 256mb
Array length: 256 - Size: 1024² - 256mb
Array length: 64 - Size: 2048² - 256mb

This is basically a pool of texture data that is around 1.5gb in size (meaning that this is compatible with most GPUs and iGPUs nowadays).

Streaming algorithm overview

The general idea with streaming is that in practice, when rendering a large scene, most of the textures are not read at their larger mipmap resolution. In fact, most are only rendered at the smallest ones with only the ones very close to the camera using the full resolution.

As such, the main idea is that when a texture that will be used in streaming is loaded, only their smallest resolution version will be loaded (as per the example above, 128x128). Then, the idea is to detect every frame which size would be required to render optimally each texture. If the optimal mipmap size is bigger than the current size loaded, then a larger version may need to be loaded.

To load a larger version, often one must determine if another of the textures used at bigger sizes is a candidate from being downgraded to a smaller size to make room for this new one.

This is done by checking:

A threshold on when this texture was last used. After a second of threshold (60 frames), the texture used least recently at that size will be downgraded.
A coverage value. If the texture is currently in use at the given resolution, but its coverage is much smaller than the new one, then the new one takes priority.

Special texture type, StreamedTexture2D

Godot, hence, should support a special texture type and shared type called StreamedTexture2D. This texture type is, unfortunately, not compatible with Texture2D. As much as I would like this to happen I don't think there really is any way to reconcile this.

StreamedTexture2D is a special resource type, textures should be imported as this special type and they will always (depending on the pool setting) be compressed to either DXT5/BC7 (desktop) or ETC2A/ASTC (mobile).

The internal file format can be the same as the one in CompressedTexture2D or very similar and, of course, the import process is a simplification of it.

Special shader texture type, sampler2DStreamed

The shader compiler needs to add this new texture type, sampler2DStreamed. Example usage:

sampler2DStreamed albedo_tex : hint_albedo;

void fragment() {

 ALBEDO = texture( albedo_tex, UV );
}

There is, however, a strong underlying difference between how the code is generated here and how the code is generated for regular textures, as this puts extra logic.

Under the hood

Under the hood, this would work somehow like this:

// On the global GLSL shader scope

uniform texture2DArray texture_streaming_pool[MAX_TEXTURE_SIZES];

buffer TextureSlots {
 ivec2 slots[];
} texture_streaming_slots;

// On the material, instead of storing a texture2D, just an uint is stored with an uint

// When the texture is actually read:

ALBEDO = texture( albedo_tex, UV );

// becomes

albedo = texture( sampler2D( texture_streaming_pool[ texture_streaming_slots[ material.albedo_tex ].x ] , sampler), vec3( UV, float(texture_streaming_slots[ material.albedo_tex ].y) );

This ensures that the texture is actually read from the pool properly, using the right size and index. The fact that an indirection is used via the texture_streaming_slots variable means that this texture can be moved between different sizes on the fly on demand. If the camera gets closer to a required higher mipmap, then the CPU can load it and move the texture without affecting any of the compiled materials.

Size/Mipmap detection

The streaming logic requires that, each time the texture is read, a buffer with the maximum mipmap the texture used is updated, then this buffer is sent back to the CPU for analysis.

This can be implemented like this:

// On the global scope

// cleared to 0xFFFFFFFF (meaning, unwritten) every frame
buffer StreamingTextureMipmapUsed
{
    uint value[];
} streaming_texture_mipmap_used;

// When the texture is actually read:

ALBEDO = texture( albedo, UV );

// Becomes

// on the statement _before_ reading, this code is added

{ // Inserted statement
uint mipmap = uint(textureQueryLod(texture_max_size).y,UV);
atomicMin( streaming_texture_mipmap_used.value[ material.albedo_tex ], mipmap);
}

// Then regular read
albedo = texture( sampler2D( texture_streaming[ texture_streaming_slots[ material.albedo_tex ].x ] , sampler), vec3( UV, float(texture_streaming_slots[ material.albedo_tex ].y) );

This ensures that when read, you can send the streaming_texture_mipmap_used back to the CPU for analysis. Of course, as-is this code would be highly inefficient because it would put an enormous memory pressure on that buffer. To offload this pressure a couple of different things can be done:

Divide the screen in 8x8 regions and walk a pixel of that region per frame. Only run this code (to query the lod and update the buffer) if the screen pixel belongs to the one walked on in that region.
Do not use a single write buffer, but several ones. If the game uses 64k textures, then your buffer is 256k. You can easily have 128 of them (32mb) and each region alternates the buffer written to. At 1080p, this means only 253 writes per frame for this texture, which is very manageable. Then the buffers can be merged later into a single one using compute before sending back.
Another advantage of using several buffers (such as 128) is that besides merging everything to a single one, we can estimate a coverage value (in how many buffers this texture was used).
Keep in mind Godot RenderingDevice does not have code to send back buffers or textures without blocking. This should be relatively trivial to add (it will arrive 3 frames late, but for this case it's fine).

Restrictions

Because of the algorithm, several restrictions need to be imposed to the usage of StreamTexture2D

It can't be used inside a for/while loop (this affects performance). Or, if it is, some bool value needs to be added to ensure that only the first read determines the size. It can't be used anywhere else than a spatial shader fragment function.

Workflow and QoL

One of the main problems with this approach is that it requires separating between Texture2D and StreamedTexture2D, which are entirely incompatible (Can't use one in place of the other) for the reasons described before.

Artists would need to reimport their textures as StreamingTexture2D if they pretend to use them like this, which is kind of a hassle, but unavoidable. At least with this PR, large part of the hassle is removed.

These are some quality of life improvements that would be added:

Besides StandardMaterial3D and ORMMaterial3D, a small change in BaseMaterial3D would allow for generating code and requesting texture files in StreamedTextured2D instead of Texture2D, which could be derived as StreamedStandardMaterial3D and StreamedORMMaterial3D.
When importing GLTF2 (or Blender) we can detect ORMMaterial3D and, if a setting is checked on the scene importer, do automatic conversion of these textures to the streamed variants and generate streamed materials instead. This setting should be possible to change globally.
Keep in mind that both ShaderLanguage and ShaderCompiler needs to be changed to support streamed textures, but also the visual shader editors have to also support them.

SlashScreen commented 3 months ago

Why are the vertex shaders barred from using these?

atirut-w commented 3 months ago

One of the main problems with this approach is that it requires separating between Texture2D and StreamedTexture2D, which are entirely incompatible (Can't use one in place of the other) for the reasons described before.

Has anyone looked into how other engines does this, UX-wise? Unity has a simple per-texture toggle that doesn't require you to change a texture's import type. Just enable it along with the global settings for mipmap streaming and bam, it just works (supposedly). I think this might be a better workflow.

equalent commented 3 months ago

@atirut-w I think this is because the Texture2D API is designed with the assumption that the texture is just an atomic preloaded resource, not a complex streamable object. It affects a big part of its API

clayjohn commented 3 months ago

Why are the vertex shaders barred from using these?

You can't automatically calculate an LOD from a vertex shader.

Has anyone looked into how other engines does this, UX-wise? Unity has a simple per-texture toggle that doesn't require you to change a texture's import type. Just enable it along with the global settings for mipmap streaming and bam, it just works (supposedly). I think this might be a better workflow.

In the doc you linked it says that for custom shaders you have to request the mip level manually. Plus you have to enable it in the import settings, just like what is suggested here.

So, just for clarity, the workflow in Unity is:

Enable streaming in the project settings
Enable streaming on the texture
Re-import the texture
Streaming works automatically in render component texture, but only with UV0
All other cases, you need to specify the mip level manually in the CPU

The proposed workflow in Godot above is:

Enable streaming on the texture
Re-import the texture
Streaming works automatically in the Standardmaterial3D
Streaming works automatically in custom shaders if you use sampler2DStreamed

atirut-w commented 3 months ago

So, just for clarity, the workflow in Unity is:

Enable streaming in the project settings

Enable streaming on the texture

Re-import the texture

Streaming works automatically in render component texture, but only with UV0

All other cases, you need to specify the mip level manually in the CPU

The proposed workflow in Godot above is:

Enable streaming on the texture

Re-import the texture

Streaming works automatically in the Standardmaterial3D

Streaming works automatically in custom shaders if you use sampler2DStreamed

The proposed Godot workflow specifically called for changing the import type, which is different than a simple toggle. This also introduces some questions:

Are the old import settings preserved with switching to StreamedTextured2D?
How would a custom shader support both types without a separate input field?

To add to the proposal, there should also be a setting to override streaming memory budget. By default, the engine would fully utilize all VRAM, but you would be able to set a custom limit.

Calinou commented 3 months ago

Are the old import settings preserved with switching to StreamedTextured2D?

Not currently, but this could be implemented in the editor by preserving properties that have the same name and type when switching import types.

godotengine / godot-proposals