Closed alrightkami closed 1 year ago
Hi @alrightkami, I saw your pull request. It seems you use "batch_dice_loss" instead of "batch_dice_loss_jit" when there is no annotations. Does the bug only appear when there is no annotation?
hey there @HaoZhang534, in my custom dataset I indeed have images with no annotations (=hard negatives). However, in the config file, I have dataloader.train.dataset.filter_empty = True
. I'm not sure if the filtering happens before or after this step?
Also, I started a training yesterday around 5 pm for train.max_iter = 36875
and it said eta: 6:23:50
, but after more than 12 hours is still in progress and says eta: 1:34:07
. I always let it train on a pretty strong GPU machine and it never took that long. So apparently batch_dice_loss
makes training much slower.
Have you solved your problem? I have merged your PR.
@FengLi-ust it does solve it, the training is not crashing and is still in progress. But as mentioned above, the training speed increases enormously
You can refer to this issue to see if this can solve you problem.
It does seem to solve it; however, not sure about the training speed yet. I created a PR for the fix
I'm closing this issue~ feel free to reopen it if needed~
When I run:
cd /home/jovyan/data/kamila/detrex && python tools/train_net.py --config-file projects/maskdino/configs/maskdino_r50_coco_instance_seg_50ep.py
I get following exception:
So far I figured out the reason why it may appear. It seems that the workaround that uses
batch_dice_loss
instead ofbatch_dice_loss_jit
as discussed in the issue fixes it, however, the training speed increases.Would really appreciate you looking at it.