Closed AsuradaYuci closed 5 years ago
I haven't met this problem before.
You can first set the train_mode to be "cnn" to get the pre-trained CNN backbone.
then jointly train the CNN-RNN model.
thanks,
@dapengchen123 I debug the code today, and I find this problem happened may because the value of loss_ver = nan in trainer.py line 139. And it makes the next line loss = loss_idself.rate + 100loss_ver = nan too. The reason is the variable mask in pairloss.py line23 mask = [0,0,0,0],for example, line 21 variable tar_gallery = [0,0], line 22 variable tar_probe = [ [82],[48], ] , so the mask = tar_probe.expand(N_probe, N_gallery).eq(tar_gallery.expand(N_probe, N_gallery)) will be [0,0,0,0]. And line 36 weights = weights / torch.sum(weights) / 10 = 0/ 0 / 10 = nan Once we used the self.BCE(), it will output loss=nan. I try to solve this problem use following code: The code can run. I am confused when you run the code,if the loss_ver will be nan? I find it sometimes not be nan, it is a small value,e,g, *1e-026.8398**.
@AsuradaYuci Do you have solved this problem? If not, I will check the code.
It ran well previously
@AsuradaYuci It is strange that why mask can be [0,0,0,0]. I use RandomPairSampler for dataloader, so that the N_gallery and N_probe should be equal, the ``mask" should be a square matrix with diagonal elements to be 1.
BTW, tar_gallery and tar_probe should be equal
@dapengchen123 I found a problem in RandomPairSampler , from line 62, i is a tensor(for example i = tensor(2021)), and line 65 pid_i = self.index_pid[i] the pid_i will always be 0 , even the i has changed to tensor(1250), pid_i still = 0. I add a new line before line65 (i = int(i), so that i = 2021 ), pid_i = 82, I think 82 is a right value. Now,I get the the ``mask" <class 'list'>: [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1] .
@dapengchen123 Hello, I'm also working on this project but I'm quite new to cv. Can you explain what's the use of mask in more details? And why tar_gallery and tar_probe should be equal? I do not quite understand.
CUDA error after cudaEventDestroy in future dtor: device-side assert triggered. More information in the pictures.
_Originally posted by @AsuradaYuci in https://github.com/dapengchen123/video_reid/issues/1#issuecomment-441043962_