When will you plan to release the pretrained model pretrained/vqvae_512_1024_2048/checkpoint-799.pth

liuzhengzhe commented 2 years ago

And also the data processing code. Thanks!

1zb commented 2 years ago

Hi Zhengzhe,

Thanks for your interest in our work. I am organizing the data preprocessing and the pretrained model.

For your information, I can describe the data processing here. A simple way is to use the data processing method proposed in Occupancy Networks. You can find it in the official repository.

An alternative choice is to use ManifoldPlus ( https://github.com/hjwdzh/ManifoldPlus ) to convert ShapeNet objects to watertight ones. Then use SDFGen or some other packages (e.g., mesh_to_sdf) to obtain point sampling.

Best, Biao

liuzhengzhe commented 2 years ago

Dear Biao,

Thanks for your reply. I tried to generate shape from image with your code. However, I found that it can produce correct category, but cannot produce shapes that match the input image. May I know whether any hyper-parameters are different from the released categorical-conditioning generation?

My loss is around 6.1, and in inference, I found that the largest probability of all the tokens are around 0.2, which seems that the confidence is too small, and generate the shape that not match the input image.

My loss: {"train_lr": 0.0003875156060424377, "train_min_lr": 0.0003875156060424377, "train_loss": 6.159766070108107, "train_loss_scale": 131072.0, "train_loss_x": 0.2729156543210233, "train_loss_y": 2.1654561555227847, "train_loss_z": 2.8282304242788507, "train_loss_latent": 0.8931638471120247, "train_weight_decay": 0.050000000000000454, "train_grad_norm": 0.6945916414260864, "epoch": 60, "n_parameters": 457923329}

Thanks a lot!

1zb commented 2 years ago

Hi Zhengzhe,

What is the input to your task? An image or a category label? To me, the loss seems to be normal. Could you provide visualizations of some generated examples so I can look at them?

Best, Biao

liuzhengzhe commented 2 years ago

Dear Biao,

Thanks very much.

For a category label as condition, the loss is around 6.1, then we can generate diversified shapes given the category. However, when we use a image feature as input, and want the network to product the corresponding shape, I think the loss should be much smaller than 6.1. Otherwise, since my loss is still 6.1, I cannot produce the corresponding shape, but still produce many diversified shapes that not match the input. What is your loss for high-resolution image to shape that aims to produce the only one corresponding shape without diversity?

Specifically, I use the CLIP feature as condition like:

features = clip_model.encode(xxxxxx) features=features.repeat(1,1,2) #make the 512-dim CLIP feature to be 1024-dim

Then I use your code to train the model, when I inference for multiple times, the results from one single feature are like these:

where the auto-encoder result is (I want to the model to generate this shape) :

To investigate the above issue, I conduct a further experiment, in stage 2 training, I use only ONE single fixed feature vector as input, and train stage 2 using only ONE shape to try to make the network to overfit to one shape.

However, the loss is still too large:

lr: 0.000022 min_lr: 0.000022 loss: 5.9182 (5.9182) loss_scale: 65536.0000 (65536.0000) loss_x: 0.2448 (0.2448) loss_y: 2.1559 (2.1559) loss_z: 2.4121 (2.4121) loss_latent: 1.1054 (1.1054) weight_decay: 0.0500 (0.0500) grad_norm: 0.4021 (0.4021)

and the output is like this:

Stage-2 result:

In inference, you sample the probability distribution to have diversity. When I choose the largest one in each inference step, the result is not good. Why is it?

Specifically, I changed modeling_prob.py
line 182: \ probs_save=probs.clone() line 200: ix[:]=torch.argmax(probs_save) to choose the largest probability token.

liuzhengzhe commented 2 years ago

I have solved this issue. Thanks.

1zb / 3DILG

When will you plan to release the pretrained model pretrained/vqvae_512_1024_2048/checkpoint-799.pth #1