Closed sft110 closed 5 years ago
sorry I don't have time to address this now, will turn back to this after iccv rebuttal
okay i'll be waiting i have tried different lr but getting same nan after first epoch.
it's running on rtx by setting cudnn.benchmark = false
@saiftumrani hi I tried batchsize=60 with lr=2e-5, on two 1080ti, but did not observe your problem after a few epoches:
Iter: [3067/4135] Freq 151.2 loss_source 0.040 loss_st 0.598 loss_ml 7262.415 loss_target 0.443 loss_total 9.878 [2019-06-30 13:05:43]
what is the value of both p_agree[similar_idx]
and self.threshold.item()
in this warning?:
/home/usr/MAR/src/utils.py:162: RuntimeWarning: invalid value encountered in greater
is_positive = p_agree[similar_idx] > self.threshold.item()
it's done, please help me with this
initializing centres/threshold ...
not found data/ml_Market.dat. computing ml...
saving computed ml to data/ml_VeRi.dat
Traceback (most recent call last):
File "src/main.py", line 46, in
@saiftumrani torch.save uses pickle as its core. From your error message it seems that your pickle version is too out-of-date (see this, in Python 3.4 and pickle 4.0 this 4GB constraint is removed). So please update your pickle version, and ensure your python/pytorch version is correct (I use python3.6 and pytorch 1.0.0).
i am using pytorch 1.0.0 and pickle protocol 4.0 still facing same problem.
@saiftumrani How about using pickle's save function instead of the torch.save? Is pickle's original function okay to save your large file?
thankyou, facing another problem while training MAR/src/utils.py:162: RuntimeWarning: invalid value encountered in greater is_positive = p_agree[similar_idx] > self.threshold.item()
What do you mean? I thought you had addressed this issued as you commented on Jul. 18
please check your email, the details have been stated in the email.
@saiftumrani how do you solve the problem about: RuntimeWarning: invalid value encountered in greater is_positive = p_agree[similar_idx] > self.threshold.item(). Hope your reply.
/home/usr/MAR/src/utils.py:162: RuntimeWarning: invalid value encountered in greater is_positive = p_agree[similar_idx] > self.threshold.item()
as you stated in the previous issue i have reduced the batch size and lr and getting error, how to deal with this error? i am using 2 GPUs of 12GB each. Iter: [900/2481] Freq 213.2 loss_total nan loss_ml nan loss_st nan loss_target nan loss_source nan [2019-06-17 11:23:52]
after first epoch, i am getting nan every time. batchsize=60 & lr= 0.0002.
and when i am trying to run on Rtx 2 GPUs of 24GB each i am getting this error Traceback (most recent call last):
File "src/main.py", line 46, in
main()
File "src/main.py", line 35, in main
meters_trn = trainer.train_epoch(source_loader, target_loader, epoch)
File "/home/saif/MAR/src/trainers.py", line 123, in train_epoch
multilabels = F.softmax(featurestarget.mm(agents.detach().t()*self.args.scala_ce), dim=1)
RuntimeError: set_storage_offset is not allowed on Tensor created from .data or .detach()
i was facing some problems with pytorch& Cuda so i installed nightly.