Hello, I'm trying to adapt the examples from Ch 8 & 10 from the book into a Double-Dueling Categorical architecture using Conv1d from Ch 10. Training seems to work fine using ptan and Pytorch ignite. I want to run validation though using openai gym, so I was wondering how to determine the next action for a new observation batch. My understanding is that for the normal dueling/double Q Conv1d network we run a forward pass of the observation through the trained network for the Q values, which we maximize to find action_idx. When running an observation through the categorical architecture however the book states a forward pass "returns the predicted probability distribution as a 3D tensor (batch, actions, and supports)." For a bar size of 10 I see clearly in my output that I get a (1,3,51) shaped tensor. But dim=1 looks to be various weights, not integers. What additional steps do I need to take in order to get the next step to take for the openai gym? Thanks in advance, and happy to post more code if needed.
Hello, I'm trying to adapt the examples from Ch 8 & 10 from the book into a Double-Dueling Categorical architecture using Conv1d from Ch 10. Training seems to work fine using ptan and Pytorch ignite. I want to run validation though using openai gym, so I was wondering how to determine the next action for a new observation batch. My understanding is that for the normal dueling/double Q Conv1d network we run a forward pass of the observation through the trained network for the Q values, which we maximize to find
action_idx
. When running an observation through the categorical architecture however the book states a forward pass "returns the predicted probability distribution as a 3D tensor (batch, actions, and supports)." For a bar size of 10 I see clearly in my output that I get a (1,3,51) shaped tensor. But dim=1 looks to be various weights, not integers. What additional steps do I need to take in order to get the next step to take for the openai gym? Thanks in advance, and happy to post more code if needed.My model: