So after training my model for 10,000 steps or 10,000,000 steps, when I go to test it with deterministic = true then the model doesn't take any actions.
It'll just pick a random action, spam it 10 times then output 0 until the episode ends.
I'm not entirely sure why, are there others who have faced this issue and overcame it? I have tried with various settings but in my testing/playing around every single trained model does essentially no active trading or decision making after 10 steps.
There is clear progress with my training with rewards increasing and good actions being made
So I guess my question is: What is a good way to ensure that the model will actually take actions when trying to deterministically test?
So after training my model for 10,000 steps or 10,000,000 steps, when I go to test it with deterministic = true then the model doesn't take any actions.
It'll just pick a random action, spam it 10 times then output 0 until the episode ends.
I'm not entirely sure why, are there others who have faced this issue and overcame it? I have tried with various settings but in my testing/playing around every single trained model does essentially no active trading or decision making after 10 steps.
There is clear progress with my training with rewards increasing and good actions being made
So I guess my question is: What is a good way to ensure that the model will actually take actions when trying to deterministically test?