BinomialLLC / basis_universal

Basis Universal GPU Texture Codec
Apache License 2.0
2.73k stars 267 forks source link

Mipmapped array textures stored incorrectly in ktx2 #296

Closed superdump closed 2 years ago

superdump commented 2 years ago

The KTX2 specification says that the mip array is supposed to be stored from highest mip (lowest resolution) to lowest mip (highest resolution): https://github.khronos.org/KTX-Specification/#_mip_level_array The level indices are supposed to be stored from the lowest mip to the highest mip: https://github.khronos.org/KTX-Specification/#_level_index And within the data for any one mip level, the structure of the data is meant to be for each layer, for each cube face, for each z slice of blocks, for each y row of blocks, for each x block: https://github.khronos.org/KTX-Specification/#levelImages

However, files output from basisu using its -tex_array option, which seems to be for creating array textures (as in those using 2d array samplers where one has to specify a layer when sampling) stores all mips for the first layer, then all mips for the second, and so on. As such, one has to iterate for each layer, for each mip level, which seems wrong?

superdump commented 2 years ago

This also seems to be broken in the toktx tool: https://github.com/KhronosGroup/KTX-Software/issues/562

superdump commented 2 years ago

I think maybe I spoke too soon. The data does seem to be laid out as mip n layer 0 ... n, mip n-1 layer 0 ... n, etc, but the GPU wants layer 0 mip 0 ... n, layer 1 mip 0 ... n, etc. So when creating the texture buffer, you have to loop over layers, then mip levels.

richgel999 commented 2 years ago

OK - will investigate. We definitely want to do this correctly.

superdump commented 2 years ago

This could well be an error on my part when uploading data to the GPU. I am currently building a contiguous buffer of the mip and layer data and uploading it in one go. When doing that, this M1 Max seems to need the data in layer 0 mip 0..n, layer 1 mip 0..n order.

richgel999 commented 2 years ago

It's unlikely both toktx, basisu, and the KTX2 validator do this incorrectly (but not impossible).

alecazam commented 2 years ago

Every api has it’s own order. Some want mip order, and some want array order. None want the order ktx2 uses which is smallest mips first. KTX was closer to ideal, but stuck dword sizes in which threw off block alignment.

Metal doesn’t even have an API construct to upload more than one face/slice at a time. Just decompress/transcode and copy to a staging buffer with block alignment. Then gpu upload with calls referencing those blocks.

Point being is that you should compress and rdo the texture data shipped to customers, so can’t just mmap it. Plus hw can only use tex memory in hw specific twiddle order done by the upload.

So if you take a decompress hit, and a further transpile hit for basis, then you are juggling two blobs of memory already. So order that to the API used. The order in file is just according to spec. I still think ktx2 shouldn’t have reordered the mips smallest first for streaming. It makes in place mipgen harder.

richgel999 commented 2 years ago

Got it - you should give this feedback on the KTX2 software github: https://github.com/KhronosGroup/KTX-Software

I believe the smallest mip appears first for streaming purposes, over the network or from disk. This makes sense, because by comparison to the speed of GPU API calls, which are very fast, streaming over the network or from disk is several orders of magnitudes slower.

This repo is for Basis Universal, which implements the KTX2 standard as it was defined. As this isn't a BasisU specific issue, I'm going to close this.

superdump commented 2 years ago

I agree with closing the issue. Sorry for the noise. I thought I had understood something but it turned out it was rather a graphics API issue.