GengzeZhou / NavGPT

[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
MIT License
135 stars 11 forks source link

How to extract observations using BLIP2? #6

Closed Leon022 closed 5 months ago

Leon022 commented 5 months ago

Hello, @GengzeZhou , First of all, thank you for your work, and forgive my unfamiliarity with the Matterport Simulator.
I would like to know how to extract observations from viewpoint using BLIP2 as described in the paper.
Could you provide more details or references to the workings of the Matterport Simulator?

GengzeZhou commented 5 months ago

Please refer to https://github.com/peteanderson80/Matterport3DSimulator for setting up the simulator and the usage of the simulator API.

An example of traversing the viewpoints in the simulator and storing the rendered image features can be found here.

You can initialize a simulation instance with customized FoV for the rendered images and change the viewing angles by makeAction in the environment.

For example, by setting the FoV to 45 and turning 45 degrees 7 times clockwise, saving the rendered images after each action will provide you with a panoramic observation at the current viewpoint with no overlapping.

Leon022 commented 5 months ago

Hello, @GengzeZhou , Thank you for your detailed guidance. I am now able to run MatterSim and obtain images. As per your advice, I have set the Field of View (FoV) to 45 degrees and turned clockwise 45 degrees 7 times, since each viewpoint generates 24 images, as demonstrated by the following code snippet: `TSV_FIELDNAMES = ['scanId', 'viewpointId', 'image_w', 'image_h', 'vfov', 'features', 'logits'] VIEWPOINT_SIZE = 24 # Number of discretized views from one viewpoint FEATURE_SIZE = 768 LOGIT_SIZE = 1000

WIDTH = 640 HEIGHT = 480 VFOV = 45

def process_features(proc_id, out_queue, scanvp_list, args): print('start proc_id: %d' % proc_id)

# Set up the simulator
sim = build_simulator(args.connectivity_dir, args.scan_dir)

# Set up PyTorch CNN model
# torch.set_grad_enabled(False)
# model, img_transforms, device = build_feature_extractor(args.model_name, args.checkpoint_file)

for scan_id, viewpoint_id in scanvp_list:
    # Loop all discretized views from this location
    images = []
    for ix in range(VIEWPOINT_SIZE):
        if ix == 0:
            sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
        elif ix % 8 == 0:
            sim.makeAction([0], [1.0], [1.0])
        else:
            sim.makeAction([0], [1.0], [0])
        state = sim.getState()[0]
        print(state.viewIndex)
        assert state.viewIndex == ix

        image = np.array(state.rgb, copy=True) # in BGR channel
        image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        images.append(image)`

However, I encountered an error: `Loaded 10567 viewpoints

start proc_id: 0

0% (0 of 10567) | | Elapsed Time: 0:00:00 ETA: --:--:--0

1 2 3 4 5 6 7 20

Process Process-1:

Traceback (most recent call last):

File "/home/miniconda3/envs/NavGPT/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run()

File "/home/miniconda3/envs/NavGPT/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs)

File "/home/VLN-HAMT/preprocess/precompute_img.py", line 100, in process_features assert state.viewIndex == ix

AssertionError`

I have altered three parts of the parameters: VIEWPOINT_SIZE, VFOV, and the condition elif ix % 8 == 0:. Could you please advise if this error is due to my settings?

Leon022 commented 5 months ago

Alright, the previous issue was due to my unfamiliarity with MatterSim, but now it's resolved.