Thanks for sharing your nice work!
I have noticed that in geolrm_wrapper.py, both the serializer and lrm_generator have a dedicated image encoder, is it possible to share the encoder?
And I don't understand why the encoder is NOT freezed? Dose it have to be optimized together with the GeoLRM?
We have tried to share the encoder but found that this will harm the performance. This is because the proposal transformer focuses on recovering coarse geometry but the reconstruction transformer needs to retrieve fine-grained details.
Our experiment shows better performance when not freezing the image encoder. Our perspective is that DINOv2 is not explicitly trained with 3D data thus requiring further fine-tuning.
Thanks for sharing your nice work! I have noticed that in geolrm_wrapper.py, both the serializer and lrm_generator have a dedicated image encoder, is it possible to share the encoder? And I don't understand why the encoder is NOT freezed? Dose it have to be optimized together with the GeoLRM?