GLB file size is limited to 2^32 bytes because of uint32 length

cansik commented 2 years ago

The GLB file size is currently limited to 2^32 bytes (approximately 4 GB) which could be limiting for some applications. For example we would like to store a mesh sequence into the GLTF format, which quickly exceed the 2^32 byte limit.

It is understandable to limit the individual chunk size to 2^32, but limiting the total file size to 2^32 seems to restrict future applications. Is there a technical limitation to only support datatypes with size of an unsigned int? Or is there a workaround for this limitation that integrates everything into a single file?

Here the current binary protocol definition of a GLB file: glbbinary Source: Binary glTF Layout

lexaknyazev commented 2 years ago

The current GLB format has several well-known shortcomings and the 4 GiB size limitation is one of them. This will likely be addressed in the future container format revisions, be it GLBv3 or something completely new.

/cc @emackey @javagl

emackey commented 2 years ago

@cansik Is there a benefit to storing > 4GB in a single file? It's a heavy lift in that form.

One possibility is to use *.gltf (not GLB) with multiple buffers, such that each buffer can be a more readable size.

Some systems such as 3D Tiles break up their dataset into a hierarchy, with lots of self-contained GLBs storing sections of the data, along with a folder system that makes it easy to find and download-on-demand a couple of small GLBs representing what the user is looking at in the moment.

javagl commented 2 years ago

The limitation itself might be a tribute to glTFs origin as a transmission format: A >4GB file will likely not be transferred to the client as a single, huge blob. So I'd hesitate to say that glTF is the right "layer" to address this, but might be convinced otherwise. In any case, this cannot easily be changed in a backward-compatible way - except for defining the GLB version to be 3, and store the length somewhere else.

A container format could alleviate the problem. And as long as there is no dedicated container format above glTF, there could even be very use-case specific solutions. A "mesh sequence" could be many things (and it sounds like something of which clients will usually not know what to do with it, unless there is an extension for this), but maybe some JSON file with { meshSequence: [ "0.glb", "1.glb"...] } could be a first shot...

An aside:

One possibility is to use *.gltf (not GLB)

I also prefer the idea of a GLB being "self-contained", but I think it is not disallowed by the spec for a GLB to refer to external resources (and cannot remember that it has been explicitly discouraged anywhere - maybe somewhere hidden in https://github.com/KhronosGroup/glTF/issues/1117 ...?).

donmccurdy commented 2 years ago

I also prefer the idea of a GLB being "self-contained", but ... cannot remember that it has been explicitly discouraged anywhere

I'd summarize #1117 as: _public user-facing tools should try to provide .gltf with external resources AND/OR .glb with embedded resources_. Other arrangements — .gltf with embedded resources or .glb with external resources — are allowed and valid where appropriate for a project, but I think it's better for most users if these don't proliferate in the 3D ecosystem.

bghgary commented 2 years ago

FWIW, I asked about self-contained glb a while back and there is a valid reason for using external resources with a glb for servers.

Question: https://github.com/KhronosGroup/glTF/issues/828#issuecomment-277314076 Response: https://github.com/KhronosGroup/glTF/issues/828#issuecomment-277474901

vpenades commented 11 months ago

As a workaround, I would suggest using zip as a plain container, and maybe the extension .GLZ could be used to distinguish it from GLB.

This would allow limitless file sizes, and clients and tools would easily adopt it. Worst case scenario could be resolved by unzipping to a directory

The recommendation would be, to keep using GLB for transmission scenarios and GLZ for large files, backup and intermediate workflows

javagl commented 11 months ago

A plain ZIP (as an archive) could have some issues because it does not allow random access. There are approaches for extending ZIP files with sorts of "indices" (basically: A file that is always stored as the last entry in the ZIP, and stores a mapping of "file name" to "byte offset in the ZIP"), but no real 'standard' for that, as far as I know.

vpenades commented 11 months ago

it does not allow random access

That's not completely true. the USDZ format, which is a glTF competitor, uses a ZIP file with the restriction of forbidding file compression, which results in a plain file with a TOC and randomly accessible files.

Krita .KRA files, and OpenRaster .ORA use a similar approach: the entries in the ZIP must be stored uncompressed to allow random access. So to some degree, we could say that uncompressed ZIPs are becoming a thing.

But it all depends on how are you going to consume the files. My SharpGLTF library already supports zipped glTFs and I don't see any problem handling compressed zips)

javagl commented 11 months ago

a ZIP file with the restriction of forbidding file compression, which results in a plain file with a TOC and randomly accessible files.

Yes, this could be an option. There's still the small caveat that on the consuming side, people will usually have to invest some effort there: I think that most "ZIP libraries" (as 'common libraries for zip handling in different programming languages') tend to hide that as an abstraction, and only offer functionalities like "iterating over all entries", or "looking up a certain entry (with a linear search under the hood)".

More technically/specifially: These libraries do not necessarily provide the low-level access to the ZIP central directory that would allow constant-time lookups.

(All this does not prevent this approach, but should be kept in mind when making this an integral part of a specification)

vpenades commented 11 months ago

ZIP file format has a table of contents with offsets and sizes that is read before accessing the rest of the content. I know a few ZIP libraries and all of them give you random access to the contents, even when compressed.

In fact, random access is available in most archive formats, the only ones not supporting random access are Tar.GZ and Rar/7z when compressed as solid archives.

vpenades commented 10 months ago

In addition to my proposal of a zipped version, I had the opportunity to tinker with OpenRaster and some other Zip based documents recently, so, I wanted to ask:

Would it be fine to open a new issue with a proposal for GLZ ? or it's better to keep discussion here?

KhronosGroup / glTF

GLB file size is limited to 2^32 bytes because of uint32 length #2114