Open HL4214 opened 1 year ago
I use blip2 https://github.com/salesforce/LAVIS/tree/main to generate prompts.
Directly a whole panorama as input to bilp2? Or is it cropped for multiple images as input?
Convert a panorama into 8 perspective images and process each perspective independently.
Thank you for sharing your work. After reading paper, I have a confusion about Matterport3D dataset.
Matterport3d only provided surface reconstructions, camera poses, and 2D and 3D semantic segmentations, and no text prompt, so I want to know how to get the text prompt for each panorama image. Is the scanNet dataset same?