buaavrcg / LEGaussians

Pytorch Code for "LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding"
https://buaavrcg.github.io/LEGaussians/
MIT License
100 stars 14 forks source link

About the language embbedings. #1

Closed XIE-ZJU closed 8 months ago

XIE-ZJU commented 8 months ago

Hi, thanks for your great job of this excellent work. Recently, I am working on the similar region of you. In the paper, you said "We aggregate features from all layers and normalize them to produce the final language embeddings." I have few questions: (1) I extracted the unnormalized embeddings following 3DOVS and get a [scale, D, H, W] tensor ([3, 512, H, W] in practice), and is this the "Dense Language Features" you said in the paper? (2) Whether the type of the normalization matters? Is it min-max, z-score or others? I'm sincerely looking for your reply. Thanks again!

Chuan-10 commented 8 months ago

Hi, (1) The shape of "Dense Language Features" is [512, H, W], achieved by summing the [scale, D, H, W] tensor along the dimension 0 and then normalizing it. (2) In order to ensure consistency in the magnitude of high-dimensional features, we normalize their length to 1.

XIE-ZJU commented 8 months ago

OK, thanks for your quick reply! I get it!