Open alecazam opened 3 years ago
It is compressing an entire level. As with the OpenGL API for uploading textures, what is stored in the mip level of a cubemap is all 6 faces at that level, of an array texture is the images of all layers at that level. KTX v2 is just like KTX v1 in this regard except that the level order is reversed and there are no level size fields mixed in with the image data.
Please point me at the confusing language in the spec.
Compressing the mip tail in one go would break the idea of being able to stream the file and display a low resolution image right away. It could be done by adding a new supercompression scheme but, if we were to add a new scheme, I think we would create one that used a dictionary shared between the mip levels. In conjunction with the zstd API that uses a decompression context I think the additional overhead for each mip level after the first would be very small. ETC1S/BasisLZ already uses a shared dictionary (a.k.a codebook).
Snippets below. I named these chunks in my KTX encoder/decoder to distinguish them from mip levels. An individual element (array, slice, or face) then represents a chunk. I know when I first implemented KTX support, and especially with storing face size and level size in the same field, I got that wrong on import/export. Maybe calling these an "aggregate level" in the spec would help.
One other part I couldn't find was a clarification of the formula for mip level calculation. All the hardware uses round-down, but it's not ideal for mipgen. There's round-up and round-down, but since DX/GL did round-down the other APIs followed suit.
We have a lot of 2D textures, but I guess we can aggregate them into a 2D array instead of dealing with the packed miptail. I appreciate that I only need one array of mip sizes, and then the chunks are just offsets once unpacked. This is something that could be done with slightly modified KTX file that strips the size and stores the zstd compressed levels. But that's mostly what KTX2 already does.
The main idea is supercompress the mips as KTX2 to get them to player, decode/transcode a level to memory as a shared MTLBuffer or mmap larger decompressed mips directly as shared MTLBuffer, and then copy/twiddle that to a private MTLTexture or sparse texture. I have a viewer and encoder I wanted to shared at https://github.com/alecazam/kram. It at least starts to open up KTX creation and visualization on macOS. Next I'd like to add KTX2 support.
** Here's one snippet where levelCount could mean the array holds one level per mip level as in a simple 2D texture which is where most start out with these texture formats.
3.7. levelCount levelCount specifies the number of levels in the Mip Level Array and, by extension, the number of indices in the Level Index array. A KTX file does not need to contain a complete mipmap pyramid. Mip level data is ordered from the level with the smallest size images, πππ£πππ to that with the largest size images, πππ£πππππ π where π=πππ£πππΆππ’ππ‘β1 and πππ π=0. πππ£πππ must not be greater than the maximum possible, πππ£πππππ₯, where
πππ₯=βlog2(max(πππ₯ππππππ‘β,πππ₯πππ»πππβπ‘,πππ₯πππ·πππ‘β))β πππ£πππΆππ’ππ‘=1 means that a file contains only the base level and the texture isnβt meant to have other levels. E.g., this could be a LUT rather than a natural image.
πππ£πππΆππ’ππ‘=0 is allowed, except for block-compressed formats, and means that a file contains only the base level and consumers, particularly loaders, should generate other levels if needed.
** and this one, is this a level of mip levels or a single mip level
Should KTX support level sizes > 4GB?
Discussion: Users have reported having base levels > 4GB for 3D textures. For this the imageSize field needs to be 64-bits. Loaders on 32-bit systems will have to ensure correct handling of this and check that imageSize <= 4GB, before loading.
Resolved: Be future proof and make all image-size related fields 64 bits.
** And this made me think the individual mips were compressed, so you could offset into them across the array.
Should the supercompression scheme be applied per-mip-level?
Discussion: Should each mip level be supercompressed independently or should the scheme, zlib, zstd, etc., be applied to all levels as a unit? The latter may result in slightly smaller size though that is unclear. However it would also mean levels could not be streamed or randomly accessed.
Resolved: Yes. The benefits of streaming and random access outweigh what is expected to be a small increase in size.
I finished KTX and KTX2 support in my viewer. It's working well, and converting KTX -> KTX2 and then using ETC/BC/ASTC + zstd supercompression really smashes them down. Also seems good for storing HDR 16f/32f source images in KTX2 files for sourc control.
I added an "any" path to test out BasisLZ, but see results on my github page since the archive was 10x bigger and 10x slower to generate using UASTC. If further encoding of the UASTC file is needed, then that defeats the purpose of having each mip level available to decompress.
I finished KTX and KTX2 support in my viewer. It's working well, and converting KTX -> KTX2 and then using ETC/BC/ASTC + zstd supercompression really smashes them down. Also seems good for storing HDR 16f/32f source images in KTX2 files for sourc control.
Sounds great. I want to try out kramv but need to finish integrating the latest Basis Universal code into KTX-Software first. See below.
I added an "any" path to test out BasisLZ, but see results on my github page since the archive was 10x bigger and 10x slower to generate using UASTC.
I am not too surprised by the 10x bigger, after all UASTC is 2x larger c/f ETC1S, but am by the 10x slower. There have been recent encoder improvements in the Basis Universal code that may help. Hence my desire to get that code integrated.
If further encoding of the UASTC file is needed, then that defeats the purpose of having each mip level available to decompress.
For UASTC, Zstd supercompression is needed to begin closing in on BasisLZ, which has built in supercompression. With UASTC + Zstd, each miplevel is supercompressed independently so you can still decompress individual levels. I do not understand your comment.
I am not too surprised by the 10x bigger, after all UASTC is 2x larger c/f ETC1S,
I think I was looking at just UASTC and RDO. So with zlib on the KTX2 file, that brought it down. I thought the intent was then the overall KTX2 file needed to be compressed, but I was hoping individual mips could be. Rich is working on the perf, so I'm sure it will improve.
For UASTC, Zstd supercompression is needed to begin closing in on BasisLZ, which has built in supercompression.
From the ktxsc usage, I wasn't sure if zstd and uastc could both be specified together. I thought those might be exclusive of one another due to the supercompression setting being either BasisLZ or Zstandard. I just tried them both, and that worked.
Also I found out 1D and 1DArray textures are pretty limited on Metal. 1D can't have compression, and can't have mips. It feels like this texture type could be replaced with 2D and 2DArray. I adjusted my scripts and kept these types.
Also I have a 4x4 checkerboard texture that's failing with "out of memory" in ktxsc. I'll file another issue on that.
I think I was looking at just UASTC and RDO. So with zlib on the KTX2 file, that brought it down. I thought the intent was then the overall KTX2 file needed to be compressed, but I was hoping individual mips could be. Rich is working on the perf, so I'm sure it will improve.
libktx
applies zstd independently to each mip level as intended by the KTX specification. Externally applying zlib, or zstd, to an entire .ktx2 file will give similar compression but breaks access to the individual mip levels and therefore streaming. With .basis files, zstd, or zlib, compression of the whole file is the only option.
From the ktxsc usage, I wasn't sure if zstd and uastc could both be specified together. I thought those might be exclusive of one another due to the supercompression setting being either BasisLZ or Zstandard. I just tried them both, and that worked.
BasisLZ is supercompressed ETC1S universal format. UASTC is another universal format which can be supercompressed with Zstd. So as you have discovered you can use both uastc and zstd in ktxsc
.
Seems that vkFormat = 0 and supercompression == 2 (Zstd) in the above scenario. So now I check for that, and reject the file until I can get UASTC decode. Seems that to get compressed transcodable files, there's another stage and temporary memory involved.
Also with some formats (BC6 and ASTC HDR and ASTC5x5+), one will still need platform-specific encoding formats. If an app doesn't use those, then I suppose they can all be transcoded from UASTC.
I could also see decode and skip for various array or cube faces that aren't needed. I'm trying to avoid writing out decompressed data to disk for mmap, since it wastes space on the mobile devices. But mmap avoids jetsam, so there are tradeoffs with compressed mips.
The levelImages
pseudo code near the start of the spec. makes it definite that a mip level includes all layers and faces or slices for that mip level. The levelCount
description refers to the "Mip Level Array" which the pseudo code shows to be a loop over levelImages
. So I don't think there is any ambiguity here. The only possible improvement I can see is to modify the Mip Level Array and levelImage sections as follows:
=== Mip Level Array
An array of `<<levelImagesDesc, levelImages>>` ordered from the level with the
smallest size images, stem:[level_p] to that with the largest size
images, stem:[level_{base}].
[NOTE]
.Rationale
====
When streaming a KTX file, sending smaller mip levels first can be
used together with, e.g., the `GL_TEXTURE_MAX_LEVEL` and
`GL_TEXTURE_BASE_LEVEL` texture parameters or appropriate region setting
in a `VkCmdCopyBufferToImage`, to display a low resolution image quickly
without waiting for the entire texture data.
====
[[levelImagesDesc]]
==== levelImages
`levelImages` is an array of Bytes holding all the image data for
a level. The data includes all array layers, all z slices, all faces, all rows
(or rows of blocks) and all pixels (or blocks) in each row for the mipmap level.
Images are concatenated in the order layer, face, slice.
The offset of a level's `levelImages` is provided by the
<<_level_index,Level Index>>.
When `<<_supercompressionscheme,supercompressionScheme>> != 0` these
bytes are formatted as specified in the scheme documentation.
How is that?
Those sentences are still not clear to me, but what you have it a little better. Removing the technical references helps me read the definition better. A diagram would probably also help. See what you think of the following:
When streaming a KTX file, the smallest images of the mip level arrays can be decoded as received, transcoded if needed, then uploaded to a buffer or texture to display a low-resolution mip chain while the remaining larger mip level arrays finish streaming. Use of lod clamping and calls to copy mips into larger textures may be needed.
levelImages
is a level that holds images at a single mip size. These images are sequenced by array, slice, then face. Each image consists of rows of blocks or pixels. The offset of a level's levelImages
is provided by the <<_level_index,Level Index>>.
Some thoughts about my use of KTX2 in the wild.
Is there documentation about supercompression preventing per image access? Formats like JPG had Huffman reset markers that let you process the compressed stream across multiple threads or skip chunks, but KTX2 doesn't have that. I can see for large 2d array atlases and sparse textures, where individual access to larger mips might be of value.
Also supercompressed Basis UASTC requires a further transcode requiring additional memory, where supercompressed BC/ETC/ASTC can decode direct to staging buffer to be twiddled to the private texture format. Note, that on consoles, the twiddling would likely be stored directly into the KTX2 blocks to avoid staging but that would have to be conveyed in a prop.
I think in general, transmitting the entire KTX2 files/bundle, mmap-ing that as compressed backing store, and then decoding mips as needed to staging buffer, then blit twiddle to private textures is ideal. Maybe gltf2 can benefit in the browser from progressive download, but it really complicates the texture loader and memory and gpu resource handling. A loader that progressively loads/drops the larger mips to conserve memory once the full KTX2 or bundle of KTX2 is available is more common for games and works to avoid jetsam on mobile. We can only supply textures in signed bundles on mobile and console, not as individual textures. One can flush the entire GPU copy, since the KTX2 is the backing store in compressed form similar to a PNG.
Also just wanted to say thanks for all the great work on KTX and KTX2. These formats are such a joy compared to all the formats I've worked with prior.
The big downside to reversed mips in KTX2 is that I have to seek backwards to write in-place mips to the file and special case code vs. KTX. With KTX, I could write mips in-order for a 2d texture. The single texture streaming isnβt applicable to any of my use cases. And KTX2 arenβt stored compressed in my archives, only the mips are.
Is there documentation about supercompression preventing per image access?
No. Perhaps I should add a note. Only zstd supercompression prevents per-image access. BasisLZ has an index of the offsets and sizes of the data for each image in the supercompressionGlobalData
field. It would be possible to do something similar for zstd along with using a global dictionary though at this point to do so we'd have to introduce a new supercompression scheme.
I can see for large 2d array atlases and sparse textures, where individual access to larger mips might be of value.
You have individual access to any mip level. Do you mean individual access to the images of a mip level? I'm not sure that is useful. For example you have to have all of a cube map's face images at a particular level size before you can use that mip level.
Yes, I was specific about 2d and 2d array atlases (and sparse), but the spec does have partial cubes, and there are cube arrays which are often locationally dependent. For example, there are many problems with combining atlas entries into a single 2d texture (f.e. mip, alignment, block bleed and no wrap support) but I see many sparse textures built this way. Also Substance uses charts which break all hope of mips. So I'm moving more towards storing atlas/flipbook data in 2d array textures. These are a fixed dimension, but make it easy for artists to build, but limited to 2048 elements. I could see load and grow the array strategies to only load atlas entries that are referenced. These are ES3 level now, so supported by all hardware of import.
The big downside to reversed mips in KTX2 is that I have to seek backwards to write in-place mips to the file and special case code vs. KTX. With KTX, I could write mips in-order for a 2d texture. The single texture streaming isnβt applicable to any of my use cases.
Seek backwards in what? What I do in the libktx writer is have a calculation of the offset of a mip level with the data and I write the data for a level to the calculated offset. The only thing that differs between KTX and KTX2 is the calculation.
And KTX2 arenβt stored compressed in my archives, only the mips are.
I don't understand what you mean by this.
@alecazam if per-image access in zstd compressed mip levels is important to you I suggest you propose a new supercompression scheme the permits it. Basically it would have supercompression global data with an index of the images within the compressed data. If it's going to have global data, it's worth considering having a global dictionary as well.
Seek backwards in what? What I do in the libktx writer is have a calculation of the offset of a mip level with the data and I write the data for a level to the calculated offset. The only thing that differs between KTX and KTX2 is the calculation.
Yes, I do something similar, but originally I tried to minimize memory use by writing mips directly to file, and then mmap-ing them back in read-only. I should probably decouple the file system from mip encoding. But currently I fseek to the offsets that I have. It just means the file system zeros a bunch of pages, and then as I seek back, then they get filled in with the mip data generated from the largest level.
@alecazam if per-image access in zstd compressed mip levels is important to you I suggest you propose a new supercompression scheme the permits it.
Yes, that's reasonable. I'm still building out the atlasing commands, and have some info on my kram page about the idea for using 2d arrays instead of charts. I likely don't have any atlases yet large enough to justify per image decode, but trying to think ahead. Mostly my arrays are small particle textures.
Also just wanted to say thanks for all the great work on KTX and KTX2. These formats are such a joy compared to all the formats I've worked with prior.
Thanks for the kind words @alecazam. Khronos will soon be announcing KTX 2.0 & universal textures support and we kindly ask if we may use this quote in the press materials. If you are okay with that, please tell me the company name and title we should use for attribution. I'm sorry for asking in this forum but I don't have any direct contact info for you and GitHub doesn't seem to have a way to send private messages.
@alecazam I received your private reply to my question about using your quote. I sent several responses from 2 different e-mail addresses asking for some additional info. I have not received any further reply from you. The announcement will be happening r.s.n so please contact me again with the info I requested.
Hey Mark, I sent you a private reply just so you had my email address from that. I didn't get any responses to the message that I sent you on that email address. alecmiller@yahoo.com is my email. Happy to confirm anything you need, and I also confirmed with my company that attributing my name and company are okay.
I sent messages to that address on Mar 6th, 13th and 17th. The last was from a different address than you sent you message to. Strange you never got them. In the email you sent me you did no identify your company or position which we would like for the attribution. That is what I was asking for in my e-mails.
Responded in private email. Company was in the original, but not position so I added that.
Thank you. I got your message. Sorry I missed the company name in your first e-mail. Strange my other messages were never delivered.
KTX and KTX2 store mips at levels (reversed from one another). For arrays, cubes, etc the spec language of mip level and a level of mips levels gets a bit conflated.
Supercompression of individual mips seems overkill at the smaller mip levels, and necessary at the larger mip levels Is there any possibility to have compression of an entire level of mip levels? For a cube or cube array, I'd want to decompress 6 faces at a time, since the texture is useless without all the data. For a 3D volume, I need an entire level before it can be displayed.
For 1D arrays, there are no mips, but I may want supercompress all levels in one compressor and then copy out the results from a single decompress. I know Basis can also optimize blocks across mip levels, and maybe across a level of mips.
Even for the basic 2D with mips case, I'm thinking of wanting to unpack a compressed packed mip tail for sparse textures, but not wanting to hit the decompressor so much. With hardware decompressors, I could see repeatedly sending small mips as performance prohibitive. I also thought with KTX1, the idea was to upload the entire level in one upload call (or copy to a buffer).
Also if a file indicated that only levels were supercompressed, you'd basically just need the same mip count setup as KTX1 but with the compressed sizes vs. uncompressed. Once uncompressed, the offset into the level is the same as with KTX1. That would save storing the compressed/uncompressed sizes for every mip level.