cshizhe / VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
MIT License
99 stars 12 forks source link

Memory leak during newEpisode in data pre-processing #16

Open GengzeZhou opened 1 year ago

GengzeZhou commented 1 year ago

Hi Shizhe,

Thanks for your great work. I have observed a memory leak when calling the newEpisode function in MatterSim when running the data pre-processing code.

def process_features(proc_id, out_queue, scanvp_list, args):
    print('start proc_id: %d' % proc_id)

    # Set up the simulator
    sim = build_simulator(args.connectivity_dir, args.scan_dir)

    # Set up PyTorch CNN model
    torch.set_grad_enabled(False)
    model, img_transforms, device = build_feature_extractor(args.model_name, args.checkpoint_file)

    for scan_id, viewpoint_id in scanvp_list:
        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([img_transforms(image).to(device) for image in images], 0)
        fts, logits = [], []
        for k in range(0, len(images), args.batch_size):
            b_fts = model.forward_features(images[k: k+args.batch_size])
            b_logits = model.head(b_fts)
            b_fts = b_fts.data.cpu().numpy()
            b_logits = b_logits.data.cpu().numpy()
            fts.append(b_fts)
            logits.append(b_logits)
        fts = np.concatenate(fts, 0)
        logits = np.concatenate(logits, 0)

        out_queue.put((scan_id, viewpoint_id, fts, logits))

    out_queue.put(None)

My memory (64GB) will be gradually taken up when loading new viewpoints, the previously taken memory will not be released. The same issue was raised in the Matterport3D simulator's official repo but no solutions have been provided yet.

This issue is not solved even if I manually add a garbage collection in the for loop:

    for scan_id, viewpoint_id in scanvp_list:
        # Set up the simulator
        sim = build_simulator(args.connectivity_dir, args.scan_dir)

        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([transform(image).to(device) for image in images], 0)
        fts = []
        for k in range(0, len(images), args.batch_size):
            with torch.cuda.amp.autocast(dtype=torch.float16):
                b_fts = ln_vision(visual_encoder(images[k: k+args.batch_size]))
            b_fts = b_fts.data.cpu().numpy()
            fts.append(b_fts)
        fts = np.concatenate(fts, 0)

        # free memory
        del sim
        gc.collect()

Therefore I believe it is caused by the memory leak in the MatterSim, do you have any suggestions on this issue?

goodstudent9 commented 3 weeks ago

Hi Shizhe,

Thanks for your great work. I have observed a memory leak when calling the newEpisode function in MatterSim when running the data pre-processing code.

def process_features(proc_id, out_queue, scanvp_list, args):
    print('start proc_id: %d' % proc_id)

    # Set up the simulator
    sim = build_simulator(args.connectivity_dir, args.scan_dir)

    # Set up PyTorch CNN model
    torch.set_grad_enabled(False)
    model, img_transforms, device = build_feature_extractor(args.model_name, args.checkpoint_file)

    for scan_id, viewpoint_id in scanvp_list:
        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([img_transforms(image).to(device) for image in images], 0)
        fts, logits = [], []
        for k in range(0, len(images), args.batch_size):
            b_fts = model.forward_features(images[k: k+args.batch_size])
            b_logits = model.head(b_fts)
            b_fts = b_fts.data.cpu().numpy()
            b_logits = b_logits.data.cpu().numpy()
            fts.append(b_fts)
            logits.append(b_logits)
        fts = np.concatenate(fts, 0)
        logits = np.concatenate(logits, 0)

        out_queue.put((scan_id, viewpoint_id, fts, logits))

    out_queue.put(None)

My memory (64GB) will be gradually taken up when loading new viewpoints, the previously taken memory will not be released. The same issue was raised in the Matterport3D simulator's official repo but no solutions have been provided yet.

This issue is not solved even if I manually add a garbage collection in the for loop:

    for scan_id, viewpoint_id in scanvp_list:
        # Set up the simulator
        sim = build_simulator(args.connectivity_dir, args.scan_dir)

        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([transform(image).to(device) for image in images], 0)
        fts = []
        for k in range(0, len(images), args.batch_size):
            with torch.cuda.amp.autocast(dtype=torch.float16):
                b_fts = ln_vision(visual_encoder(images[k: k+args.batch_size]))
            b_fts = b_fts.data.cpu().numpy()
            fts.append(b_fts)
        fts = np.concatenate(fts, 0)

        # free memory
        del sim
        gc.collect()

Therefore I believe it is caused by the memory leak in the MatterSim, do you have any suggestions on this issue?

Don't render image from the simulator, which means set the variable "Render***"(sorry for forgetting the whole spell) to false. Do as this work, only get angle and connection information from sim will not cause memory leaking.

jj023721 commented 3 weeks ago

你好,这里是ylJiang,你的邮件已收到!

GengzeZhou commented 3 weeks ago

@goodstudent9 Thanks for your reply. The point here is that I want to render RGB images at any resolution during navigation and also when saving visual features, where memory leaking is observed. However, according to your answer, the memory leaking could be located in the image rendering process in the simulator. This makes sense because all transformer-based VLN methods (DUET, HAMT, RecBERT, BEVBERT) preload visual features in their code, and they would avoid this problem during training.