KainingYing / CTVIS

ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
MIT License
70 stars 4 forks source link

Some code issues #3

Open fanghaook opened 1 year ago

fanghaook commented 1 year ago
  1. The code cannot be trained, the error is: CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward cost_class = -out_prob[:, tgt_ids] IndexError: tensors used as indices must be long, int, byte or bool tensors
  2. I test the author's YTVIS19_R50 model is 54.4AP, but the result does not match the paper. It is 55.1AP in the paper and 55.2AP in README.md.
  3. visualize_all_videos.py and demo.py cannot run, import many modules that are not included in the code.
KainingYing commented 1 year ago

Thanks for your attention.

  1. This issue is caused by the Python environment. Please follow the Install to construct the conda environment. And we fix this in the latest version.
  2. The fluctuation in mAP is due to differences in environment and equipment. We inference on RTX 3060, PYTHON 3.10, CUDA 11.7, PYTORCH 2.0.0. We don't know how to mitigate this phenomenon. If you know how, we would be very grateful.
  3. We fix this in the latest version.
fanghaook commented 1 year ago

I tried retraining to reproduce the author's results, but the results were not satisfactory. The version of YTVIS2019_ResNet50 only had 51-52AP, far from reaching 55AP. Let me briefly describe the training process:

  1. I first created a new environment entirely according to the author's README.md, but after python train_ctvis.py, an error occurred: ImportError: /home/.local/lib/python3.10/site-packages/MultiScaleDeformableAttention-1.0-py3.10-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops4view4callERKNS_6TensorEN3c108ArrayRefIlEE I guess there are still some incompatibilities in the environment version provided by the author. The author can try creating a new environment to see if it is compatible.
  2. I created a new environment based on INSTALL.md in Mask2Former's GitHub and found that your code can run normally, so I started training. Due to limited GPU memory, I set Batchsize to 8 and trained under multiple iterations such as (6000, 12000) and (12000, 24000), other hyperparameters have also been modified accordingly, but the result was only 51-52AP.
  3. I tried the code that the author updated a few days ago, include cfg.MODEL.CL_PLUGIN.NOISE_EMBED = False and other updates, but the results were even worse.
KainingYing commented 1 year ago
  1. When you reconstruct the environment, it is best to delete the mask2former/modeling/pixel_decoder/ops/build directory. This command works well for me.
  2. Can you give the detailed config file?
KainingYing commented 1 year ago

Hi @fanghaook ,

When we prepared our submission, we only set the SOLVER.IMS_PER_BATCH as 16 empirically. We also found locally that using small batches leads to extremely unstable results, and may even drop 2~3 AP. We are finding the best config setting for small batch sizes or implementing the gradient accumulation to simulate identical batch sizes in limited GPUs.

fanghaook commented 1 year ago
  1. If I have time, I will try rebuilding the environment, but the environment should not be the reason for poor performance.
  2. IMS_PER_BATCH: 4,8, BASE_LR:0.0001,0.00005, STEPS: (6000, 12000),(12000, 24000),MAX_ITER:16000,32000. The above is my hyperparameter selection. I trained multiple sets of different hyperparameter combinations, but the results were not good. Sometimes, even the results with a batch size of 4 are better than 8, which is very strange.