donmccurdy / glTF-Transform

glTF 2.0 SDK for JavaScript and TypeScript, on Web and Node.js.
https://gltf-transform.dev
MIT License
1.3k stars 145 forks source link

VRM gets treated as glTF instead of GLB #1390

Open marwie opened 1 month ago

marwie commented 1 month ago

Describe the bug

Applying e.g. draco compression to a .vrm file results in all resouces being externalized. The VRM is then saved as a glTF json (with the vrm extension)

According to https://github.com/vrm-c/vrm-specification/blob/master/specification/0.0/README.md however the VRM is a GLB format

To Reproduce Steps to reproduce the behavior:

  1. Download zip
  2. Run gltf-transform draco "AvatarSample_A.vrm" "AvatarSample_A.vrm"
  3. See unpacked png textures

Expected behavior VRM gets saved as GLB

Versions:

Additional context

AvatarSample_A.zip

Before image

After image

donmccurdy commented 1 month ago

I'm going to retag this as a feature, the library currently knows nothing about .vrm or other third-party variants of .gltf/.glb. But I think I agree that this should be made to 'just work' for the .vrm file extension (other than not understanding the VRM extensions embedded in the file).

I suspect it's mostly a change in the CLI, for script-based usage, io.writeBinary works.

Related:

donmccurdy commented 1 month ago

Also, probably the library should log a warning when it's asked to write a file extension it doesn't know the format of (json vs. binary). Perhaps with some way to register new extensions and their format. Like:

import { Format } from '@gltf-transform/core';

io.registerFileSuffix('vrm', Format.GLB);
hybridherbst commented 1 month ago

I think another option could be if the check "is this a binary or a text file" is based on the GLB magic header instead of relying on file extensions (which could possibly not exist at all for entirely different reasons).

donmccurdy commented 1 month ago

True – that'd work for the CLI, though not in the script-based I/O case, which may not read from disk. Not sure if something like ...

gltf-transform cp blob_no_extension other_blob

... is worth trying to guarantee same binary vs. json output, I think I'd prefer to log a warning that the library doesn't know what you mean.

hybridherbst commented 1 month ago

which may not read from disk

Maybe I'm misunderstanding what you mean – the magic header for GLB ("glTF") is there no matter if it's just a blob or read from disk.

donmccurdy commented 1 month ago

In this scenario...

const document = io.read('path/to/input.unknown');

...

await io.write('path/to/output.unknown', document);

... there is nothing in document to tell the I/O class whether the original source on disk was JSON or binary, so no guarantees can be made that it will be written in the original format without user/application input.

marwie commented 1 month ago

It's that what @hybridherbst means: take notice of the original file header in the io class and - if using write and if the format is unknown then use that information to write as either json or binary?

donmccurdy commented 1 month ago

Understood, but I don't think that I agree with the suggestion. Creates open-ended issues:

I'm OK with making changes to ensure .vrm is handled like .glb, and perhaps allowing registration of other extensions... but I do think that unknown extensions should trigger warnings rather than attempting auto-detection, which I don't think can be supported consistently.

hybridherbst commented 1 month ago

Do I understand right that the complexity here comes from the fact that

looks at the string path only, and it's not easy to add "look at the first 4 bytes to check what it actually is" since the actual file reads currently happen later / after that decision?

donmccurdy commented 1 month ago

From my perspective the complexity comes from trying to keep a persistent knowledge of what the original source-on-disk of a particular Document might have been. It's 'easy' at an application level, if your pipeline looks exactly like this:

const document = await io.read('in.unknown');

// ... make some edits ...

await io.write('out.unknown', document);

But at the library level, I don't want to make an assumption that the pipeline above is what's happening. Documents can be merged or cloned, applications might use different I/O classes for reading and writing. I think it's too much magic, when the alternative is much clearer and not particularly complex:

import { writeFile } from 'node:fs';

await writeFile('out.unknown', await io.writeBinary(document));
hybridherbst commented 1 month ago

OK, understood! I think I looked in the wrong place, what you meant with "adding custom filetypes" is probably that the ".vrm is to be treated as a glb file" check would have to happen somewhere around here (at the end of the pipeline, not at the beginning).