VRM gets treated as glTF instead of GLB

marwie commented 1 month ago

Describe the bug

Applying e.g. draco compression to a .vrm file results in all resouces being externalized. The VRM is then saved as a glTF json (with the vrm extension)

According to https://github.com/vrm-c/vrm-specification/blob/master/specification/0.0/README.md however the VRM is a GLB format

To Reproduce Steps to reproduce the behavior:

Download zip
Run gltf-transform draco "AvatarSample_A.vrm" "AvatarSample_A.vrm"
See unpacked png textures

Expected behavior VRM gets saved as GLB

Versions:

Version: 3.10.1
Environment: node20

Additional context

AvatarSample_A.zip

Before

After

donmccurdy commented 1 month ago

I'm going to retag this as a feature, the library currently knows nothing about .vrm or other third-party variants of .gltf/.glb. But I think I agree that this should be made to 'just work' for the .vrm file extension (other than not understanding the VRM extensions embedded in the file).

I suspect it's mostly a change in the CLI, for script-based usage, io.writeBinary works.

https://github.com/donmccurdy/glTF-Transform/issues/1137

donmccurdy commented 1 month ago

Also, probably the library should log a warning when it's asked to write a file extension it doesn't know the format of (json vs. binary). Perhaps with some way to register new extensions and their format. Like:

import { Format } from '@gltf-transform/core';

io.registerFileSuffix('vrm', Format.GLB);

hybridherbst commented 1 month ago

I think another option could be if the check "is this a binary or a text file" is based on the GLB magic header instead of relying on file extensions (which could possibly not exist at all for entirely different reasons).

donmccurdy commented 1 month ago

True – that'd work for the CLI, though not in the script-based I/O case, which may not read from disk. Not sure if something like ...

gltf-transform cp blob_no_extension other_blob

... is worth trying to guarantee same binary vs. json output, I think I'd prefer to log a warning that the library doesn't know what you mean.

hybridherbst commented 1 month ago

which may not read from disk

Maybe I'm misunderstanding what you mean – the magic header for GLB ("glTF") is there no matter if it's just a blob or read from disk.

donmccurdy commented 1 month ago

In this scenario...

const document = io.read('path/to/input.unknown');

...

await io.write('path/to/output.unknown', document);

... there is nothing in document to tell the I/O class whether the original source on disk was JSON or binary, so no guarantees can be made that it will be written in the original format without user/application input.

marwie commented 1 month ago

It's that what @hybridherbst means: take notice of the original file header in the io class and - if using write and if the format is unknown then use that information to write as either json or binary?

donmccurdy commented 1 month ago

Understood, but I don't think that I agree with the suggestion. Creates open-ended issues:

what happens when two documents are combined?
what happens when a different I/O instance is used to read than to write?
what about .writeJSON / .readJSON, which supports both .glb/.gltf but doesn't get to see the magic header?
if buffers are added to the file, must they be merged, or do we convert from .glb to .gltf?

I'm OK with making changes to ensure .vrm is handled like .glb, and perhaps allowing registration of other extensions... but I do think that unknown extensions should trigger warnings rather than attempting auto-detection, which I don't think can be supported consistently.

hybridherbst commented 1 month ago

Do I understand right that the complexity here comes from the fact that

https://github.com/donmccurdy/glTF-Transform/blob/3b9edcf4fdd11a68e1a64093ea23325548462fee/packages/core/src/utils/file-utils.ts#L22

looks at the string path only, and it's not easy to add "look at the first 4 bytes to check what it actually is" since the actual file reads currently happen later / after that decision?

donmccurdy commented 1 month ago

From my perspective the complexity comes from trying to keep a persistent knowledge of what the original source-on-disk of a particular Document might have been. It's 'easy' at an application level, if your pipeline looks exactly like this:

const document = await io.read('in.unknown');

// ... make some edits ...

await io.write('out.unknown', document);

But at the library level, I don't want to make an assumption that the pipeline above is what's happening. Documents can be merged or cloned, applications might use different I/O classes for reading and writing. I think it's too much magic, when the alternative is much clearer and not particularly complex:

import { writeFile } from 'node:fs';

await writeFile('out.unknown', await io.writeBinary(document));

hybridherbst commented 1 month ago

OK, understood! I think I looked in the wrong place, what you meant with "adding custom filetypes" is probably that the ".vrm is to be treated as a glb file" check would have to happen somewhere around here (at the end of the pipeline, not at the beginning).

https://github.com/donmccurdy/glTF-Transform/blob/3b9edcf4fdd11a68e1a64093ea23325548462fee/packages/core/src/io/node-io.ts#L135

donmccurdy / glTF-Transform

VRM gets treated as glTF instead of GLB #1390