Open Masaaki-75 opened 6 months ago
Hello, sorry for making you wait for so long, since we are working on other stuffs. Did you solve this issue? I guess this may be caused by the code version mismatch. Which version of code are you using?
I am not sure about the exact version. I guess it would be from submit
branch in January but seems like it is gone now. Here's what I can confirm:
SimpleVQTokenizer
architecture is the same as in https://github.com/function2-llx/PUMIT/blob/67218a2aebf145b0b6f5cd3ae292adfe39f22561/pumit/tokenizer/simple.py. The detail arguments are set as in_channels = 3, start_stride = 4, downsample_layer_channels = [128, 256, 512], upsample_layer_channels = [128, 256, 512], encoder_act = nn.GELU)
.VectorQuantizer
architecture is the same as in https://github.com/function2-llx/PUMIT/blob/67218a2aebf145b0b6f5cd3ae292adfe39f22561/pumit/tokenizer/quantize.pySpatialTensor
class is from https://github.com/function2-llx/PUMIT/blob/67218a2aebf145b0b6f5cd3ae292adfe39f22561/pumit/sac.pyAlso, the detailed architecture of SimpleVQTokenizer
is as follows, if this will help:
SimpleVQTokenizer(
(quantize): VectorQuantizer(
(proj): Linear(in_features=512, out_features=1024, bias=True)
(embedding): Embedding(1024, 512)
)
(encoder): Sequential(
(0): InflatableConv3d(3, 128, kernel_size=(4, 4, 4), stride=(4, 4, 4))
(1): LayerNormNd(
(0): ChannelLast('n c ... -> n ... c')
(1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(2): ChannelFirst('n ... c -> n c ...')
(3): Contiguous()
)
(2): GELU(approximate='none')
(3): InflatableConv3d(128, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2))
(4): LayerNormNd(
(0): ChannelLast('n c ... -> n ... c')
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): ChannelFirst('n ... c -> n c ...')
(3): Contiguous()
)
(5): GELU(approximate='none')
(6): InflatableConv3d(256, 512, kernel_size=(2, 2, 2), stride=(2, 2, 2))
(7): LayerNormNd(
(0): ChannelLast('n c ... -> n ... c')
(1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(2): ChannelFirst('n ... c -> n c ...')
(3): Contiguous()
)
(8): GELU(approximate='none')
(9): InflatableConv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(10): GroupNorm(8, 512, eps=1e-05, affine=True)
(11): LeakyReLU(negative_slope=0.01, inplace=True)
(12): InflatableConv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(13): GroupNorm(8, 512, eps=1e-05, affine=True)
(14): LeakyReLU(negative_slope=0.01, inplace=True)
)
(decoder): Sequential(
(0): InflatableConv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(1): GroupNorm(8, 512, eps=1e-05, affine=True)
(2): LeakyReLU(negative_slope=0.01, inplace=True)
(3): InflatableConv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(4): GroupNorm(8, 512, eps=1e-05, affine=True)
(5): LeakyReLU(negative_slope=0.01, inplace=True)
(6): AdaptiveTransposedConvUpsample(
(transposed_conv): InflatableTransposedConv3d(512, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2))
(conv): Sequential(
(0): InflatableConv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(1): GroupNorm(8, 256, eps=1e-05, affine=True)
(2): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
(7): AdaptiveTransposedConvUpsample(
(transposed_conv): InflatableTransposedConv3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2))
(conv): Sequential(
(0): InflatableConv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(1): GroupNorm(8, 128, eps=1e-05, affine=True)
(2): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
(8): InflatableTransposedConv3d(128, 3, kernel_size=(4, 4, 4), stride=(4, 4, 4))
)
)
@Masaaki-75 My dear friend, you forgot to perform the vector quantization. You should call tokenizer.quantize(z)
before decoding.
Sorry again for the late reply.
Hi! I am trying to use the pretrained tokenizer to obtain latent code for my input CT images.
However, I didn't see the identity-mapping-like reconstruction as demonstrated in Figure 3 of your paper. I guess there's something wrong with the way I handle input.
Here's the process:
I was expecting that
y
looks similar asx
, but the visualization shows:Any advice on that? Thanks!
BTW, here's the info about
x0
,x
,z
andy
, if needed: