habitat-baselines: PointNavBaselineNet produces non-deterministic features (not just random actions)

alexsax commented 2 years ago

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: vx.x.x or master? master

Habitat-Sim: vx.x.x or master? 0.2.1

🐛 Bug

I am trying to run the PointGoal RGB PPO baseline from habitat-baselines folder. In the PPO update step the agent produces values, action_log_probs, and dist_entropy. However, those actual values seem to have some noise in their output: (code reproduced below).

Specifically, changing lines L86-L97 demonstrates this:

(
    values,
    action_log_probs,
    dist_entropy,
    rnnstates,
) = self._evaluate_actions(
    batch["observations"],
    batch["recurrent_hidden_states"],
    batch["prev_actions"],
    batch["masks"],
    batch["actions"],
)
(
    values_dup,
    action_log_probs_dup,
    dist_entropy_dup,
    rnnstates_dup,
) = self._evaluate_actions(
    batch["observations"],
    batch["recurrent_hidden_states"],
    batch["prev_actions"],
    batch["masks"],
    batch["actions"],
)
# The following will trigger the assert:
assert torch.allclose(values, values_dup), f"Values do not match {torch.cat([values1, values2], dim=1)}"

And actually, all of the outputs are slightly different--not just the values, but also rnn_hidden_states, perception network features, and so on.

Steps to Reproduce

Steps to reproduce the behavior:

Modify habitat_baselines.rl.ppo.ppo.py L86-L97 with the code above

Run

export BASE_DIR=logs/test_aug_rgb; rm -rf $BASE_DIR; python -u habitat_baselines/run.py --exp-config habitat_baselines/config/pointnav/ppo_pointnav.yaml \
--run-type train \
CHECKPOINT_FOLDER ${BASE_DIR}/ckpt \
EVAL_CKPT_PATH_DIR ${BASE_DIR}/ckpt \
TENSORBOARD_DIR ${BASE_DIR}/tb \
VIDEO_DIR ${BASE_DIR}/videos \
TOTAL_NUM_STEPS -1.0

Assert should stop execution

I can share exact details if the above isn't sufficient!

Expected behavior

Running the same inputs through PointNavBaselineNet (actor_critic) should produce the same outputs. So I'd expect torch.allclose(values, values_dup)==True. Note that at this stage there shouldn't be any randomness in action selection--the underlying features and distributions that the actor-critic policy is producing are not consistent here. This sort of nondeterminism is unexpected and complicates adding additional consistency regularization.

Generally, the outputs of the two forward passes are different in the 3rd or 4th decimal place, which is much larger than what I'd expect from just numerical error. I've tried setting torch.use_deterministic_algorithms(True, warn_only=True), but to no avail.

erikwijmans commented 2 years ago

Is there a reason to believe that this is an issue unique to habitat-baselines? Lack of exact determinism is a common issue with PyTorch (really cuda and friends).

Some other things that you can try setting: There's also the cublas flag export CUBLAS_WORKSPACE_CONFIG=:4096:8 (this is almost certainly needed for helping the lstm out). You can also try disabling cudnn entirely with torch.backends.cudnn.enabled = False. I'd be surprised if anything in cudnn is deterministic (it has a deterministic flag that PyTorch sets but who knows if that actually works....).

alexsax commented 2 years ago

Hey @erikwijmans thanks for the quick response :)

I think this issue is unique to habitat-baselines.The resnet18 in the example above produces features with noise (difference between consecutive runs) in the [1e-6 , 1e-4] range. Is this something you've noticed before?

Compare this to the torchvision resnet18, where this doesn't happen (e.g. in the code snippet below)

import torch
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True).cuda()
x = torch.rand((20,3,224,224)).cuda() * 255
y1 = resnet18(x)
y2 = resnet18(x)
assert torch.allclose(y1, y2)
print((y2 - y1).abs().max())   # >>> tensor(0., device='cuda:0', grad_fn=<MaxBackward1>)

erikwijmans commented 2 years ago

The habitat-baselines resnet is based off the torchvision one, there are a couple changes but it's still all just standard PyTorch, nothing that would obviously change things. The biggest difference is GroupNorm vs. BatchNorm, maybe GN is less deterministic?

facebookresearch / habitat-lab