Clarifications Needed on Implementation Details

Hello!

Congratulations on the acceptance of your paper; it's truly impressive work! I have a few questions regarding the detailed implementation and would greatly appreciate any insights from you or the community:

Embedding Non-Integer Values: It appears that each dimension of the translation vector, rotation quaternion, stitching info, and panel edges is a single token. How do you transform non-integer values (e.g., 3.14) into an embedding, given that nn.Embedding only takes integers as indices and outputs embeddings?
Stitch Tag Calculation: How is the stitch tag $S{i, j} \in \mathbb{R}^3$ computed based on the 3D replacement of the edge? I believe $S{i, j}$ denotes the rotation of the edge, possibly represented as an axis-angle?
Decoder Initialization: For the normal map and roughness map, are the decoder parameters initialized with the pretrained LDM decoder and then fine-tuned? This seems inconsistent with the diffuse map decoder, which is frozen and not updated during fine-tuning. What is the rationale behind this differing training approach for the decoders?
PBR Dataset Collection: How was the PBR dataset collected, and will it be open-sourced? Specifically, I'm curious about how the ground truth for the normal and roughness maps was obtained. The diffuse map ground truth seems straightforward as a texture. I guess that the normal map ground truth is calculated by choosing two adjacent points around the target point and use the micro facet's normal vector as the target point normal vector. Is my guess right? The roughness map ground truth appears particularly challenging to acquire, as manual assignment to every local point seems unrealistic. Could you elaborate on the methods used for these calculations?

Thank you for your time and assistance!

Hi!

Thanks for your interest in our work! We're working on releasing our code soon.

Embedding Non-Integer Values: We predefine some constants in Section 3.1 Quantization part. We multiply these non-integer values with the constants and take the integer part. We carefully select these constants (shown in the Implementation Details) to offer a good trade-off between maintaining the fidelity of sewing patterns and managing the vocabulary size.
Stitch Tag Calculation: Stitch tags are calculated based on Ground Truth garments. We follow a similar implement used in NeuralTailor.
Decoder Initialization: The decoder parameters are initialized with the pretrained LDM VAE and then fine-tuned. Our thinking is for diffuse maps, the encoder encodes the image to the latent code z, and the decoder decodes z to the original diffuse maps, which has the same function as the pretrained LDM VAE. However, for normal maps and roughness maps, we train the decoders that decode the latent code z (z is still from the diffuse map) to normal maps and roughness maps respectively. It is like we train two mappings for diffuse maps to normal maps and diffuse maps to roughness maps.
PBR Dataset Collection: The normal map and roughness map are paired with each diffuse map in the dataset, so we don't need to calculate them additionally.

IHe-KaiI / DressCode

Clarifications Needed on Implementation Details #2