Thanks for your excellent work!
Initially I used your pre-trained model for evaluation, no issues were encountered during this process.
But when I started training the CVUSA model, I found that the results differed from pre-trained. There was no images in trains or val.
At the same time, during training, there were warnings prompted, and no loss values were displayed.
Time 9.938 ( 9.938) Data 8.832 ( 8.832) Loss nan (nan) Mean-P 0.34 ( 0.34) Mean-N nan ( nan)
waring:
UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1, 1, 384], strides() = [99072, 384, 1]
bucket_view.sizes() = [1, 1, 384], strides() = [384, 384, 1] (Triggered internally at /opt/conda/conda-bld/pytorch_1656352430114/work/torch/csrc/distributed/c10d/reducer.cpp:326.)
How to solve this issue, looking forward to your reply.
Thanks for your excellent work! Initially I used your pre-trained model for evaluation, no issues were encountered during this process. But when I started training the CVUSA model, I found that the results differed from pre-trained. There was no images in trains or val. At the same time, during training, there were warnings prompted, and no loss values were displayed.
Time 9.938 ( 9.938) Data 8.832 ( 8.832) Loss nan (nan) Mean-P 0.34 ( 0.34) Mean-N nan ( nan)
waring: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [1, 1, 384], strides() = [99072, 384, 1] bucket_view.sizes() = [1, 1, 384], strides() = [384, 384, 1] (Triggered internally at /opt/conda/conda-bld/pytorch_1656352430114/work/torch/csrc/distributed/c10d/reducer.cpp:326.)
How to solve this issue, looking forward to your reply.