Output of network should represent probabilistic distribution such as mean and std for gaussian for now. However if something like flow is used for policy, it is impossible to implement it without fixing loss functional. because flow's lld (log likelihood) is computed through network with determinant of jacobian.
Output of network should represent probabilistic distribution such as mean and std for gaussian for now. However if something like flow is used for policy, it is impossible to implement it without fixing loss functional. because flow's lld (log likelihood) is computed through network with determinant of jacobian.