DeepX-inc / machina

Control section: Deep Reinforcement Learning framework
MIT License
279 stars 45 forks source link

Remove pds (probabilistic distributions) class and incorporating to pol (policy) class. #166

Open rarilurelo opened 5 years ago

rarilurelo commented 5 years ago

Output of network should represent probabilistic distribution such as mean and std for gaussian for now. However if something like flow is used for policy, it is impossible to implement it without fixing loss functional. because flow's lld (log likelihood) is computed through network with determinant of jacobian.