Some questions about the code implementation

qwqwq1445 commented 1 year ago

1.In the Pretraining Stage It is said that you only use the labeled source data in the pretraining stage in your published paper. However, I use the same strategy and found that the model can not produce reliable pseudo labels. Given that you conduct adversarial training on your network, why don't you use the unlabel target data for adversarial training during the pretraining stage? 2.In the Joint-training Stage I wonder how do you set the arg 'pseudo_label_policy'. When I set the arg to 'by_consistency', it seems the model didn't train well. Do you set the arg to 'traditional'? If so, how do you choose the valuable pseudo labels? By the module PostProcess and a threshold? Wish all the best, looking forward to your reply.

Lafite-Yu commented 1 year ago

For the second question, I remember that traditional is the default setting in command line arguments? And as mentioned in the paper, pseudo-label filtering is simply performed by threshold filtering. 'by_consistency' is experimental, and no improvements are obtained in experiments, and thus, we finally discard that.

For the first question, a good starting point is loading the SFA retained weights for cityscapes. Actually, we also did not pretrain the model on the cityscapes -> foggy cityscapes scenario well, and we load the SFA pretrained weights, except for the newly added components. (This can also be done for the other two scenarios.) Interestingly, as SFA did not provide pertained weights for the other two tasks, the model trained by ourselves can largely beat SFA. Thus, we think our model's performance on foggy cityscapes might be improved by better pretraining.

The reason for a separate pertaining stage is that to generate pseudo-labels on unlabeled data, the model should have already learned something, and a randomly initialized model can not generate meaningful pseudo labels.

qwqwq1445 commented 1 year ago

For the second question, I remember that traditional is the default setting in command line arguments? And as mentioned in the paper, pseudo-label filtering is simply performed by threshold filtering. 'by_consistency' is experimental, and no improvements are obtained in experiments, and thus, we finally discard that.

For the first question, a good starting point is loading the SFA retained weights for cityscapes. Actually, we also did not pretrain the model on the cityscapes -> foggy cityscapes scenario well, and we load the SFA pretrained weights, except for the newly added components. (This can also be done for the other two scenarios.) Interestingly, as SFA did not provide pertained weights for the other two tasks, the model trained by ourselves can largely beat SFA. Thus, we think our model's performance on foggy cityscapes might be improved by better pretraining.

The reason for a separate pertaining stage is that to generate pseudo-labels on unlabeled data, the model should have already learned something, and a randomly initialized model can not generate meaningful pseudo labels.

Appreciated for your apply.

qwqwq1445 commented 1 year ago

Another details I would like to know:

How many epochs do you use for model pretraining and self-supervised training?
How do you set your random seed?

Lafite-Yu commented 1 year ago

Q1: I'm not sure about the exact number, but for the pretraining stage, the number of epochs is relatively large. The model is trained until the performance converges to a satisfying level. You may need to try many things, like lr, random seed, and even warm restarts. (However, we failed to get a good cityscapes pretrained model, and thus we started from the SFA pretrained weights; while for the other two scenarios, we did not try many configs to get a satisfying one.) For the self-supervised training, the number should be small (less than ten or even five), or it collapses quickly (you may need a small validation set split from the training set or try a small fixed epoch num, or we think this can be a future work).

Q2: For the self-sup training phase, we did not set the random seed by some complex methods, maybe we tried some common numbers like 42 or 0, but however, the default number in arguments is what we really use.

Lafite-Yu commented 1 year ago

Oh, by the way, considering the recent progress in UDA OD in the past year, you may better start from some other open-source UDA OD projects by adding the mean teacher workflow to them (referring to this repo, or projects like Unbiased Teacher in semi-supervised OD area we referred to), as we noticed some of them can largely surpass our performance without the mean teacher.

qwqwq1445 commented 1 year ago

Your work has achieved great performance in the Sim10k2Cityscapes benchmark, I wonder how you set the coefficient of the three domain loss for this benchmark? Like in SFA, the coefficient of TIFA and DQFA is set to 0.01 and 0.001.

Lafite-Yu / MTTrans-OpenSource

Some questions about the code implementation #4