Open NoB0 opened 10 months ago
Sorry for the late reply! @ChrisGeishauser Could you have a look at this issue? how to train RL policy
it seems like a simple indexing error, the method
policy.vector.state_vectorize(s)
returns the state embedding along with a mask, a simple fix can be just taking the 0th index
s_vec = torch.Tensor(policy.vector.state_vectorize(s)[0])
although I am not sure what the mask does. it seems like the example file and the ppo policy implementation are out of sync
Describe the bug The script
example_train.py
in Train RL Policies does not run.To Reproduce Steps to reproduce the behavior:
Error:
Expected behavior The script should train a dialogue policy.
Actual behavior The script fails to run.
Additional context I guess that the example should run with the
multiwoz21
dataset. Thus, I modifiedexample_train.py
as shown below:But then, it seems that the state does not have the correct input size, see:
Any help on this matter would be appreciated.