AnujMahajanOxf / VIREL

Code for VIREL: A Variational Inference Framework for Reinforcement Learning
14 stars 5 forks source link

Which version of rlkit did you use? #1

Open jeffwillette opened 4 years ago

jeffwillette commented 4 years ago

Hello, I am trying to run the code here, but it seems that https://github.com/vitchyr/rlkit has changed quite a lot since this was published. I have found that rlkit tag v0.1.2 is pretty close to the version you used, but it still crashes with some errors. I am wondering if you know the exact commit that works with your VIREL code?

jeffwillette commented 4 years ago

to update, I found that v0.1.2 actually works, but there was an error in the VIREL code (as far as I know) here https://github.com/AnujMahajanOxf/VIREL/blob/master/VIREL_code/virel.py#L174

I had to change these lines to zero all the gradients, call backward for all, and then make an optimizer step...

        self.qf_optimizer.zero_grad()
        self.vf_optimizer.zero_grad()
        self.policy_optimizer.zero_grad()

        qf_loss.backward()
        vf_loss.backward()
        policy_loss.backward()

        self.qf_optimizer.step()
        self.vf_optimizer.step()
        self.policy_optimizer.step()

would this hinder the algorithm in any way? I don't see why it would but I am not sure.

I also had trouble running the gym-mujoco-v1 experiments since there seems to be no humanoid or humanoid-rllab or halfc tasks listed in gym. How do you load these tasks in the gym?

AnujMahajanOxf commented 3 years ago

Hi, you should look for a rlkit commit around Jan 15, 2019.

What exactly is the error without reordering the gradient update steps?

jeffwillette commented 3 years ago

@AnujMahajanOxf I get an error that something required by the gradient was modified by an in-place operation...

Traceback (most recent call last):
  File "virel_exp.py", line 79, in <module>
    experiment(variant)
  File "virel_exp.py", line 40, in experiment
    algorithm.train()
  File "/home/jeff/ml/rlkit/rlkit/core/rl_algorithm.py", line 143, in train
    self.train_online(start_epoch=start_epoch)
  File "/home/jeff/ml/rlkit/rlkit/core/rl_algorithm.py", line 167, in train_online
    self._try_to_train()
  File "/home/jeff/ml/rlkit/rlkit/core/rl_algorithm.py", line 231, in _try_to_train
    self._do_training()
  File "/home/jeff/ml/rlkit/rlkit/torch/sac/virel.py", line 179, in _do_training
    policy_loss.backward()
  File "/home/jeff/.venv/env/lib/python3.8/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/jeff/.venv/env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 125, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
jeffwillette commented 3 years ago

I also found that beta-VIREL calls a method named get_batch_custom() which is not anywhere in the history of rlkit nor anywhere in this repository. Is this the same code that was used to run the experiments?