ZzZZCHS / Chat-Scene

Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
MIT License
91 stars 6 forks source link

About data preprocessing. #31

Closed KaKa-101 closed 4 months ago

KaKa-101 commented 5 months ago

Thanks for your work and when I preprocessed the data guided by readme, I met the following two questions:

  1. First, this link seems to point to an empty file. image And in the run_prepare.sh, the value of "version" is set to null. Is that right? image Also it seems that there isn't the file inference.sh yet and I don't know whether other files in preprocess need to be updated or not. image
  2. Second, when I prepare the conda environment as said in mask3d, the following error appears: image It seems that "opencv==0.17.0"(or other versions of opencv) cannot be installed in the setting of python-3.10.9. Have you ever encountered this problem and how should I fix it ? Appreciate for your valuable work and help.
ZzZZCHS commented 5 months ago
  1. The run_prepare.sh refers to preprocess/run_prepare.sh. Thanks for pointing out the link problem. The version is set to null as default. It's right. You can find scripts/inference.sh in this repo which was forked from Uni3D.
  2. It's a little tricky to install open3d package. This package is only used in their visualization code. So we skipped installing this package and directly run the inference code.
KaKa-101 commented 5 months ago
  1. The run_prepare.sh refers to preprocess/run_prepare.sh. Thanks for pointing out the link problem. The version is set to null as default. It's right. You can find scripts/inference.sh in this repo which was forked from Uni3D.
  2. It's a little tricky to install open3d package. This package is only used in their visualization code. So we skipped installing this package and directly run the inference code.

Thanks for your reply.

  1. Could you tell me what's the destination of the step Process data after getting every instance segmentation of each scene using pretrained Mask3D model? image
  2. And after conducting bash preprocess/run_prepare.sh using mask3d_inst_seg you have provided, I met the following error: image
ZzZZCHS commented 5 months ago
  1. The destination of the step Process data is prepare QA pairs for each task/dataset, and transform the GT IDs to correponding segmented IDs (based on IoU between GT instances and segmented instances)
  2. This is my problem. Some of them are stilling requiring some old files which I haven't provided. I will fix these code soon (to only rely on the official annotations of each dataset)
ZzZZCHS commented 5 months ago

Hi @KaKa-101, we've updated the preprocessing codes to only rely on official annotations of each dataset.

KaKa-101 commented 5 months ago

Thanks for your great work. But when I prepare the environment for uni3d, when I run pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib" as guided, I got the following error: 468e77df2d1a25e01b1873841808263 Do you know how to fix it? I'd appreciate it very much.

ZzZZCHS commented 5 months ago

I just now followed the instruction of uni3d to install the environment and got the same error as yours.

Then I find the problem is related to the pytorch version. You would install pytorch 2.3.0 using his scripts, which is probably uncompatible with pointnet2_ops for now. I switch it to pytorch 2.0.1 (which is the version we used before), then it works.

I recommend you trying this:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia -y
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
KaKa-101 commented 5 months ago

Thanks for your advice and it works. But I don't get the desired segment_result_dir using pretrained Mask3D to do instance segmentation. First, I preprocess the datasets like this (--scannet200 is setted to False) guided in Mask3D: image Then I conduct validation in ScanNet like this: image But in the eval_output folder, there is nothing. There are also only logs in saved folder: image So do you know how should I change the settings or anything to get the instance segmentation results? Could you share what your scannet_val.sh looks like? Besides I want to know why you choose scannet200_val.pt but not scannet.pt as pretrained checkpoint since we are conducting experiments on ScanNet dataset. Thanks very much for your help and I'd appreciate it a lot.

KaKa-101 commented 5 months ago

Also, could you show me what the folder segment_result_dir (the whole prediceted results using pretrained Mask3D checkpoint) looks like?

ZzZZCHS commented 5 months ago

This is the script I used:

#!/bin/bash
export OMP_NUM_THREADS=3  # speeds up MinkowskiEngine

NUM_GPUS=1
CURR_DBSCAN=0.95
CURR_TOPK=750
CURR_QUERY=100

# TEST
python main_instance_segmentation.py \
general.gpus=${NUM_GPUS} \
general.experiment_name="test0416_scannet200" \
general.project_name="test0416" \
general.checkpoint='checkpoints/scannet200/scannet200_val.ckpt' \
data/datasets=scannet200 \
general.num_targets=201 \
data.num_labels=200 \
general.train_mode=false \
general.eval_on_segments=true \
general.train_on_segments=true \
model.num_queries=${CURR_QUERY} \
general.topk_per_image=${CURR_TOPK} \
general.use_dbscan=true \
general.dbscan_eps=${CURR_DBSCAN} \
general.save_visualizations=false \
general.export=true

Basically, you need to set general.export=true to let the model export predicted masks. The eval_output folder will look like this: Screenshot 2024-05-07 at 10 15 01 PM

And with the default setting, you can only get the result of val split. To get the result of train split, I directly changed this line to self._load_yaml(database_path / f"train_database.yaml")

ScanNet and ScanNet200 share the same scene data. The only difference between them is the category annotation (the latter one has 200 classes). In our experiment, we find that the model trained on ScanNet200 generates a better instance segmentation result. Plus we do not need the prediceted class labels for our model, so using scannet200_val.pt should be ok.

KaKa-101 commented 5 months ago

Thanks for your quick reply, which helps me a lot. And I want to know if I use scannet200_val.pt as pretrained_weight, do I need to set --scannet200 to True in the step of preprocessing dataset: image

ZzZZCHS commented 5 months ago

Yes, you need to set --scannet200=true.

KaKa-101 commented 5 months ago

Thanks a lot. Besides, in the paper you mention that you designed a relation module to incorporate spatial information into a relation-aware token for each object in the scene-level alignment. Could you explain this part in detail and which part of the code you provided corresponds to the relation module?

ZzZZCHS commented 5 months ago

We proposed the relation module to get a scene-aware token for each object here.

However, we've discarded it in v2.1 since we find these scene-aware tokens do not help improve the performance. We are still exploring how to incorporate position information into the model in a better way.

KaKa-101 commented 5 months ago

Thanks for your reply.

  1. And could you tell me what's the difference between scannet_train/val_attributes.pt and scannet_mask3d_train/val_attributes.pt?
  2. It seems that both of them are consist of labels and points' coordinates of all instances in each scene, the former is getted from official annotations in ScanNet dataset and the latter is getted from the instance segmentation results using pretrained Mask3D?
  3. And what's the function of these files in subsequent operations? Is the scannet_train/val_attributes.pt used to provided spatial information between objects in the scene to LLM and scannet_mask3d_train/val_attributes.pt used to provided each object's attributes(class label, coordinates, segments and so on) to LLM?
ZzZZCHS commented 5 months ago
  1. Yes. scannet_train/val_attributes.pt saves the location (3D center and size, which can be transformed to 3D bounding box) and class label of each GT instance. While scannet_mask3d_train/val_attributes.pt saves the location and class label of each segmented instance (from Mask3D).
  2. scannet_train/val_attributes.pt is only used for evaluation (calculate the IoU between GT instance's bbox and segmented instance's bbox). We only input the segmented instances to the model (both for training and evaluating).
KaKa-101 commented 5 months ago

Thanks for your reply, which helps a lot. And I find that there is a file named scan2cap_val_corpus.json in your provided annotations, but it seemed I can't generate this file using provided preprocessing code. So what's the function of this file and do I need it for subsequent training and inference? image

ZzZZCHS commented 5 months ago

It's used in Scan2Cap evaluation. We reference Scan2Cap's code to prepare this corpus. Basically, it converts all the original descriptions into "sos {description} eos".

KaKa-101 commented 5 months ago

This is the script I used:

#!/bin/bash
export OMP_NUM_THREADS=3  # speeds up MinkowskiEngine

NUM_GPUS=1
CURR_DBSCAN=0.95
CURR_TOPK=750
CURR_QUERY=100

# TEST
python main_instance_segmentation.py \
general.gpus=${NUM_GPUS} \
general.experiment_name="test0416_scannet200" \
general.project_name="test0416" \
general.checkpoint='checkpoints/scannet200/scannet200_val.ckpt' \
data/datasets=scannet200 \
general.num_targets=201 \
data.num_labels=200 \
general.train_mode=false \
general.eval_on_segments=true \
general.train_on_segments=true \
model.num_queries=${CURR_QUERY} \
general.topk_per_image=${CURR_TOPK} \
general.use_dbscan=true \
general.dbscan_eps=${CURR_DBSCAN} \
general.save_visualizations=false \
general.export=true

Basically, you need to set general.export=true to let the model export predicted masks. The eval_output folder will look like this: Screenshot 2024-05-07 at 10 15 01 PM

And with the default setting, you can only get the result of val split. To get the result of train split, I directly changed this line to self._load_yaml(database_path / f"train_database.yaml")

ScanNet and ScanNet200 share the same scene data. The only difference between them is the category annotation (the latter one has 200 classes). In our experiment, we find that the model trained on ScanNet200 generates a better instance segmentation result. Plus we do not need the prediceted class labels for our model, so using scannet200_val.pt should be ok.

Thanks for your provided scipt of Mask3D. Do you know in which file I can learn or configure all the parameters. For example, I want to know where the setting general.export=true takes effect. I haven't figured out how these parameters work in the code of Mask3D. I will appreciate it a lot if you can provide help.

ZzZZCHS commented 5 months ago

It takes effect here. The basic configs are in this file. Their code is based on pytorch lightning, maybe you need to find some tutorials about this.