Closed zxczhai closed 6 months ago
I have the same questions👀🤔 Have you figured it out? @zxczhai
I have the same questions👀🤔 Have you figured it out? @zxczhai
Did you use speedup while training? it seems that because of CLIP text encoder, the size of feature dimension should be same. so I think each gaussian dim should be 512 when doing downsteam task with lseg
I have the same questions👀🤔 Have you figured it out? @zxczhai
Did you use speedup while training? it seems that because of CLIP text encoder, the size of feature dimension should be same. so I think each gaussian dim should be 512 when doing downsteam task with lseg
That's indeed a prior that the query feature dim (which is actually the clip dim) and the generated feature dim must be the same. So if we use speedup for training, it cannot perform the query, right?
Hi there! Yes, for the language-guided editing task, since the editing happened in 3D, we have to make the feature dimension on each 3D Gaussian match with the CLIP text feature dimension, which is 512. Therefore, as indicated in our readme, we don't use --speedup for this task.
i follow the step and use the extraction banana's dataset,but i have the wrong as follow. Could you tell me how to solve it?Thanks a lot!
edit_extraction.yaml
when i change "fruit" to "apple","banana".it wrong either.