How to train on custom datasets?

litingfeng commented 2 years ago

Hi,

Thanks for sharing the code.

I was wondering if you could also share the instruction on how to train on custom datasets. Since I've noticed that you modified the low level code such as builtin.py in data.datasets, rather than registering somewhere else.

Thanks in advance.

michaelku1 commented 2 years ago

Detectron2 has a whole tutorial on how to register dataset, no?

litingfeng commented 2 years ago

@michaelku1 Are you implying that only registering functions (in datasets.builtin.py and cityscapes_foggy.py) need to be changed? If not, which else need to be modified?

Sorry I'm not that familiar with the inside of detectron2, .e.g., when builtin.py is called.

Thanks.

michaelku1 commented 2 years ago

If I remember correctly, first thing is to add a _PREDEFINED_SPLITS_XXX dictionary, this is for specifying the paths of all train, val, test sets from your dataset directory. Next, you need to define your register function, this is for registering all coco instances of your dataset. One more important thing is that you need to call your register function in init.py to make it work. I think that's all you need for registering custom datasets. You can check out the tutorial for more standard and detailed explanation of course. (note: I'd recommend you to also see yjliu's work on "unbiased teacher" as they have similar codes, and I'm sure you will learn a lot from it)

litingfeng commented 2 years ago

@michaelku1 Thank you so much for the detailed answer! This is very helpful! 👍

yujheli commented 2 years ago

Thank you for answering questions for us!

darkhan-s commented 2 years ago

Hi @yujheli! I was trying to test the model on my custom VOC format dataset, and ended up having the following error:

ERROR:adapteacher.engine.trainer:Exception during training:
Traceback (most recent call last):
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 402, in train_loop
    self.run_step_full_semisup()
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 647, in run_step_full_semisup
    self._write_metrics(metrics_dict)
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 674, in _write_metrics
    metrics_dict = {
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 675, in <dictcomp>
    k: np.mean([x[k] for x in all_metrics_dict])
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 675, in <listcomp>
    k: np.mean([x[k] for x in all_metrics_dict])
KeyError: 'Analysis_pred/Num_bbox'
Traceback (most recent call last):
  File "adaptive_teacher/train_net.py", line 73, in <module>
    launch(
  File "/users/saidnad1/.local/lib/python3.9/site-packages/detectron2/engine/launch.py", line 67, in launch
    mp.spawn(
  File "/projappl/project_2005695/miniconda3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/projappl/project_2005695/miniconda3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/projappl/project_2005695/miniconda3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/projappl/project_2005695/miniconda3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/users/saidnad1/.local/lib/python3.9/site-packages/detectron2/engine/launch.py", line 126, in _distributed_worker
    main_func(*args)
  File "adaptive_teacher/train_net.py", line 66, in main
    return trainer.train()
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 384, in train
    self.train_loop(self.start_iter, self.max_iter)
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 402, in train_loop
    self.run_step_full_semisup()
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 647, in run_step_full_semisup
    self._write_metrics(metrics_dict)
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 674, in _write_metrics
    metrics_dict = {
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 675, in <dictcomp>
    k: np.mean([x[k] for x in all_metrics_dict])
  File "adaptive_teacher/adapteacher/engine/trainer.py", line 675, in <listcomp>
    k: np.mean([x[k] for x in all_metrics_dict])
KeyError: 'Analysis_pred/Num_bbox'

I was testing with detectron2 (0.6) and it seemed to be doing fine for 868 iterations. I assume something is wrong with the adaptation model, what could it possibly be?

I see that the lines in the Pseudo labeling section of the training flow are commented out, was there a reason for it?

# analysis_pred, _ = self.probe.compute_num_box(gt_unlabel_k,pesudo_proposals_rpn_unsup_k,'pred',True)
# record_dict.update(analysis_pred)

yujheli commented 2 years ago

@darkhan-s May I ask how you registered your custom VOC dataset? I think following the code in L156-165 in builtin.py should work.

For the error, I am not pretty sure but I think if you comment on the code for probing it should solve the problem. Looks like the model does not predict any box and probing was not working successfully.

In trainer.py, L557-558 or L573-574 is just for probing to see the ratio of false positives in pseudo labeling. You can comment them all out and which would not affect the training process.

darkhan-s commented 2 years ago

For the error, I am not pretty sure but I think if you comment on the code for probing it should solve the problem. Looks like the model does not predict any box and probing was not working successfully.

In trainer.py, L557-558 or L573-574 is just for probing to see the ratio of false positives in pseudo labeling. You can comment them all out and which would not affect the training process.

@yujheli Commenting that out seems to work. I assume it does not predict any pseudo box at the very early stage because the accuracy is too low? So should we modify the code so that it would attempt to take a probe only if there are any pseudo labels predicted?

yujheli commented 2 years ago

@darkhan-s Yes, I think that makes more sense. I will need to figure out when exactly the probing will crash and then revise it.

facebookresearch / adaptive_teacher

How to train on custom datasets? #6