facebookresearch / sound-spaces

A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.
Creative Commons Attribution 4.0 International
322 stars 55 forks source link

Issue about scripts/cache_observations.py #112

Open Hoyyyaard opened 1 year ago

Hoyyyaard commented 1 year ago

When I try to pre-compute the semantic sensor like rgb sensor using the scripts/cache_observations.py, Sometimes the returned image of the semantic sensor is a rotation of the Rgb image, and sometimes it doesn't match Rgb image at all.Can you please help me to solve the problem? Thanks!

code of scripts/cache_observations.py:

Copyright (c) Facebook, Inc. and its affiliates.

All rights reserved.

This source code is licensed under the license found in the

LICENSE file in the root directory of this source tree.

import os import argparse import pickle import tqdm

import magnum as mn import numpy as np

import habitat_sim from habitat.core.registry import registry from habitat.core.simulator import SensorSuite from habitat_sim.utils.common import quat_from_angle_axis from soundspaces.utils import load_metadata from ss_baselines.av_nav.config import get_config

def create_sim(scene_id, sensor_suite): backend_cfg = habitat_sim.SimulatorConfiguration() backend_cfg.scene_id = scene_id backend_cfg.enable_physics = False

agent_cfg = habitat_sim.agent.AgentConfiguration()

sensor_specifications = []
for sensor in sensor_suite.sensors.values():
    sim_sensor_cfg = sensor._get_default_spec()
    sim_sensor_cfg.uuid = sensor.uuid
    sim_sensor_cfg.resolution = list(
    sim_sensor_cfg.sensor_type = sensor.sim_sensor_type

agent_cfg.sensor_specifications = sensor_specifications

return habitat_sim.Configuration(backend_cfg, [agent_cfg])

def main(dataset): """ This functions computes and saves the visual observations for the pre-defined grid points in SoundSpaces 1.0 """ parser = argparse.ArgumentParser() parser.add_argument( "--config-path", type=str, default='ss_baselines/av_nav/config/audionav/{}/train_telephone/pointgoal_rgb.yaml'.format(dataset) ) args = parser.parse_args()

config = get_config(args.config_path)
sim_sensors = []
for sensor_name in ["RGB_SENSOR", "DEPTH_SENSOR", "SEMANTIC_SENSOR"]:
    sensor_cfg = getattr(config.TASK_CONFIG.SIMULATOR, sensor_name)
    sensor_type = registry.get_sensor(sensor_cfg.TYPE)
sensor_suite = SensorSuite(sim_sensors)

num_obs = 0
scene_obs_dir = 'data/scene_observations_wsemantic/' + dataset
os.makedirs(scene_obs_dir, exist_ok=True)
metadata_dir = 'data/metadata/' + dataset
for scene in tqdm.tqdm(os.listdir(metadata_dir)):
    scene_obs = dict()
    scene_metadata_dir = os.path.join(metadata_dir, scene)
    points, graph = load_metadata(scene_metadata_dir)
    if dataset == 'replica':
        scene_id = os.path.join('data/scene_datasets', dataset, scene, 'habitat/mesh_semantic.ply')
        scene_id = os.path.join('data/scene_datasets', dataset, scene, scene + '.glb')

    sim_config = create_sim(scene_id, sensor_suite)
    sim = habitat_sim.Simulator(sim_config)

    for node in graph.nodes():
        agent_position = graph.nodes()[node]['point']
        for angle in [0, 90, 180, 270]:
            agent = sim.get_agent(0)
            new_state = sim.get_agent(0).get_state()
            new_state.position = agent_position
            new_state.rotation = quat_from_angle_axis(np.deg2rad(angle), np.array([0, 1, 0]))
            new_state.sensor_states = {}
            agent.set_state(new_state, True)

            sim_obs = sim.get_sensor_observations()
            obs = sensor_suite.get_observations(sim_obs)
            import cv2
            import matplotlib.pyplot as plt
            cv2.imwrite("./rgb.png", sim_obs['rgb'])
            cv2.imwrite("./depth.png", sim_obs['depth']*255)
            plt.imsave("./semantic.png", sim_obs['semantic'])
            # cv2.imwrite("./semantic.png", cv2.resize(sim_obs['semantic'].astype(np.float32),(128,128)).astype(np.uint8))
            scene_obs[(node, angle)] = obs
            num_obs += 1

    print('Total number of observations: {}'.format(num_obs))
    with open(os.path.join(scene_obs_dir, '{}.pkl'.format(scene)), 'wb') as fo:
        pickle.dump(scene_obs, fo)
    del sim

if name == 'main':

print('Caching Replica observations ...')

# main('replica')
print('Caching Matterport3D observations ...')
ChanganVR commented 1 year ago

@Hoyyyaard did you follow the instructions in the step-by-step installation guide? you'll need to install a specific habitat version to render the observations.

Hoyyyaard commented 1 year ago

I have followed the instructions in the step-by-step installation guide to install soundspace in v0.2.2 and checkout to v0.1.7 both for lab and sim to cache the observation. But I failed to get the result in the readme (success rate of 0.97 and a SPL of 0.803164) when runed the followed command .

python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth 
ChanganVR commented 1 year ago

@Hoyyyaard what are the numbers you get?

Hoyyyaard commented 1 year ago

About >50% and <60% inSR

ChanganVR commented 1 year ago

Seems like there is a bug. I'm looking into it now. Will update you soon.

ChanganVR commented 1 year ago

@Hoyyyaard sorry about the delay due to a NeurIPS submission. I was trying to reproduce the error but I couldn't. I'm getting 95% SR and I think the observations are rendered according to the installation document. Just to debug, could you evaluate on my rendered observations (you can download them from https://drive.google.com/file/d/1I_eVW4X8sSEaABHOTFq7JpioT9EtnRwo/view?usp=share_link) and see what numbers you are getting?

sun17-311 commented 6 months ago

@ChanganVR This link is not working(https://drive.google.com/file/d/1I_eVW4X8sSEaABHOTFq7JpioT9EtnRwo/view?usp=share_link), cloud you give me a new one