NIRVANALAN / LN3Diff

[ECCV-2024] LN3Diff creates high-quality 3D object mesh from text within 8 V100-SECONDS.
https://nirvanalan.github.io/projects/ln3diff/
Other
163 stars 10 forks source link

ValueError: mean length and number of channels do not match. Got torch.Size([3]) and torch.Size([6, 4, 224, 224]). #4

Closed Au-LiuJY closed 1 month ago

Au-LiuJY commented 2 months ago

When l run the bash shell scripts/final release/inference/sample obajverse i23d dit.sh ,some errors occur. Followings are my work steps:. I use the demo image"teasure_chest-input.,png",and resize it to (504,504),because the following setting is to devide it into 3 pieces.(The original size is (505,505)) image

And in the "/LN3Dif/./sgm/modules/encoders/modules.py", the function of "encode with vision transformer" image

it will raise an assertion error,because the value of self.max crops is 0 while the image.shape is "torch.Size([1, 6,4, 252, 252])"after it l have deleted this line of code"assert self.max crops == img.shape[1]"

Later,it finally happened: File "/LN3Diff/./sgm/modules/encoders/modules.py", line 646, in preprocess x = kornia.enhance.normalize(x, self.mean, self.std) File "/anaconda3/envs/ln-3diff/lib/python3.9/site-packages/kornia/enhance/normalize.py", line 109, in normalize raise ValueError(f"mean length and number of channels do not match. Got {mean.shape} and {data.shape}.") ValueError: mean length and number of channels do not match. Got torch.Size([3]) and torch.Size([6, 4, 224, 224]).

image

Can you please give me some advice on it?Thanks in advance!

Au-LiuJY commented 2 months ago

In short,the main problem is that the channel of the picture doesn't match the shape of mean or std

And later I directly choose [:, :, :3],it will raise another error: /LN3Diff/./nsr/lsgm/flow_matching_trainer.py", line 616, in sample_and_save assert c[k].shape[0] == 1 AssertionError

NIRVANALAN commented 2 months ago

Hi, thanks for your interest, it seems that I have used the wrong dataset class for I23D. I have already fixed in the latest commit. The dataset you posted are for multi-view conditioned 3D generation, which I am still testing. Will release that in the near future.