The time cost of mask prediction

Epiphqny commented 4 years ago

Wonderful work! May I ask how long it takes to train the panoptic segmentation network? I trained the mask branch and found it is much lower than the detection-only network.

alcinos commented 4 years ago

Hi @Epiphqny If you follow the recipe of the paper (which is the fastest way we have currently), you should:

Train DETR with boxes only, on the panoptic dataset (will learn to detect both things and stuff). This should be as fast as the normal detr training
Freeze the network, and train the mask head for 25 epochs (lr drop at 15). This should take about 6-7 hours on 8 nodes.

Good luck! I believe I have answered your question, and as such I'm closing this. Feel free to reach out if you have further questions.

Epiphqny commented 4 years ago

@alcinos Thank you very much for your quick reply. Actually I only want to train the instance segmentation model directly without bbox pretrained, and I use the command "python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --masks --coco_path /path/to/coco" to train the model, and found it runs much slower than the bbox only version, is it correct to train instance segmentation? Do I need to change other code for the task?

m-klasen commented 4 years ago

@Epiphqny Hi, even when you just want the instance masks you are better off training boxes in detection mode followed by masks only training, for me it was a lot faster and performed better. Basically as @alcinos said, train without masks until bbox convergence, in your case like this python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco. Then freeze all layers except for the bbox_attention and mask_head and train with --masks until your mask AP converged. And no, AFAIK you don't need to change any code, the mask-flag should automatically provide the Intermediate resnet layers required to the mask-head. This way your instance segmentation trains a lot faster. Good Luck.

alcinos commented 4 years ago

What @mlk1337 said is correct. Some remarks:

Training boxes is required in all cases, since the hungarian loss requires them to compute the assignment, so the model must be trained to predict them
In general, transformers take much longer than convolutional networks to converge. Here, the mask head can be trained in very few epochs (provided the rest of the network is already trained and fixed), hence it's a bit wasteful to try to train it simultaneously with the transformer for the whole 300 epochs (since it's much more costly than the rest, computationally and memory-wise).
With a big-enough batch-size, the "train boxes and masks simultaneously" approach should roughly match the "train boxes -> freeze -> train masks" way that we recommend. But in the first method, since you'll likely be limited to 1 image per gpu, you'll likely require something like 64 gpus for a few days.
I realized the readme isn't super clear on how to achieve this freezing method. Here is how you should do:

Train boxes with python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --output_dir /path/to/out_directory_for_boxes. This should train for 300 epochs with default settings
Train masks with python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --masks --epochs 25 --lr_drop 15 --frozen_weights /path/to/out_directory_for_boxes/checkpoint.pth --output_dir /path/to/out_directory_for_masks. 25 epochs is generally enough for coco, but you can increase if you feel like convergence is not reached.

Good luck!

Epiphqny commented 4 years ago

@mlk1337 @alcinos Thank you very much for the detailed instructions. Yes previously I want to train boxes and masks simultaneously and found it is too costly. I will follow the "train boxes -> freeze -> train masks" way.

fmassa commented 4 years ago

@alcinos could you send a PR clarifying a bit more the README on this part?

facebookresearch / detr

The time cost of mask prediction #97