Closed liuzhengzhe closed 2 years ago
Hi Zhengzhe,
Thanks for your interest in our work. I am organizing the data preprocessing and the pretrained model.
For your information, I can describe the data processing here. A simple way is to use the data processing method proposed in Occupancy Networks. You can find it in the official repository.
An alternative choice is to use ManifoldPlus ( https://github.com/hjwdzh/ManifoldPlus ) to convert ShapeNet objects to watertight ones. Then use SDFGen or some other packages (e.g., mesh_to_sdf) to obtain point sampling.
Best, Biao
Dear Biao,
Thanks for your reply. I tried to generate shape from image with your code. However, I found that it can produce correct category, but cannot produce shapes that match the input image. May I know whether any hyper-parameters are different from the released categorical-conditioning generation?
My loss is around 6.1, and in inference, I found that the largest probability of all the tokens are around 0.2, which seems that the confidence is too small, and generate the shape that not match the input image.
My loss: {"train_lr": 0.0003875156060424377, "train_min_lr": 0.0003875156060424377, "train_loss": 6.159766070108107, "train_loss_scale": 131072.0, "train_loss_x": 0.2729156543210233, "train_loss_y": 2.1654561555227847, "train_loss_z": 2.8282304242788507, "train_loss_latent": 0.8931638471120247, "train_weight_decay": 0.050000000000000454, "train_grad_norm": 0.6945916414260864, "epoch": 60, "n_parameters": 457923329}
Thanks a lot!
Hi Zhengzhe,
What is the input to your task? An image or a category label? To me, the loss seems to be normal. Could you provide visualizations of some generated examples so I can look at them?
Best, Biao
Dear Biao,
Thanks very much.
Specifically, I use the CLIP feature as condition like:
features = clip_model.encode(xxxxxx) features=features.repeat(1,1,2) #make the 512-dim CLIP feature to be 1024-dim
Then I use your code to train the model, when I inference for multiple times, the results from one single feature are like these:
where the auto-encoder result is (I want to the model to generate this shape) :
However, the loss is still too large:
lr: 0.000022 min_lr: 0.000022 loss: 5.9182 (5.9182) loss_scale: 65536.0000 (65536.0000) loss_x: 0.2448 (0.2448) loss_y: 2.1559 (2.1559) loss_z: 2.4121 (2.4121) loss_latent: 1.1054 (1.1054) weight_decay: 0.0500 (0.0500) grad_norm: 0.4021 (0.4021)
and the output is like this:
Stage-2 result:
Specifically, I changed modeling_prob.py
line 182: \ probs_save=probs.clone()
line 200: ix[:]=torch.argmax(probs_save)
to choose the largest probability token.
I have solved this issue. Thanks.
And also the data processing code. Thanks!