Open alexsax opened 2 years ago
Is there a reason to believe that this is an issue unique to habitat-baselines? Lack of exact determinism is a common issue with PyTorch (really cuda and friends).
Some other things that you can try setting: There's also the cublas flag export CUBLAS_WORKSPACE_CONFIG=:4096:8
(this is almost certainly needed for helping the lstm out). You can also try disabling cudnn entirely with torch.backends.cudnn.enabled = False
. I'd be surprised if anything in cudnn is deterministic (it has a deterministic flag that PyTorch sets but who knows if that actually works....).
Hey @erikwijmans thanks for the quick response :)
I think this issue is unique to habitat-baselines.The resnet18 in the example above produces features with noise (difference between consecutive runs) in the [1e-6 , 1e-4] range. Is this something you've noticed before?
Compare this to the torchvision
resnet18, where this doesn't happen (e.g. in the code snippet below)
import torch
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True).cuda()
x = torch.rand((20,3,224,224)).cuda() * 255
y1 = resnet18(x)
y2 = resnet18(x)
assert torch.allclose(y1, y2)
print((y2 - y1).abs().max()) # >>> tensor(0., device='cuda:0', grad_fn=<MaxBackward1>)
The habitat-baselines resnet is based off the torchvision one, there are a couple changes but it's still all just standard PyTorch, nothing that would obviously change things. The biggest difference is GroupNorm vs. BatchNorm, maybe GN is less deterministic?
Habitat-Lab and Habitat-Sim versions
Habitat-Lab: vx.x.x or master?
master
Habitat-Sim: vx.x.x or master?
0.2.1
🐛 Bug
I am trying to run the PointGoal RGB PPO baseline from
habitat-baselines
folder. In the PPO update step the agent producesvalues
,action_log_probs
, anddist_entropy
. However, those actual values seem to have some noise in their output: (code reproduced below).Specifically, changing lines L86-L97 demonstrates this:
And actually, all of the outputs are slightly different--not just the values, but also rnn_hidden_states, perception network features, and so on.
Steps to Reproduce
Steps to reproduce the behavior:
habitat_baselines.rl.ppo.ppo.py
L86-L97 with the code aboveI can share exact details if the above isn't sufficient!
Expected behavior
Running the same inputs through PointNavBaselineNet (actor_critic) should produce the same outputs. So I'd expect
torch.allclose(values, values_dup)==True
. Note that at this stage there shouldn't be any randomness in action selection--the underlying features and distributions that the actor-critic policy is producing are not consistent here. This sort of nondeterminism is unexpected and complicates adding additional consistency regularization.Generally, the outputs of the two forward passes are different in the 3rd or 4th decimal place, which is much larger than what I'd expect from just numerical error. I've tried setting
torch.use_deterministic_algorithms(True, warn_only=True)
, but to no avail.