-
I want to ask one more thing about the estimation of discounted reward. The variable discounted reward always starts with zero. However, if the episode is not ended, should it be the value estimation …
-
Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,
```
def calculate_returns(self, rewards, dones, normalize = True):
…
-
`truncated_generalized_advantage_estimation` should have the `stop_target_gradients` defaulted to `True`
https://github.com/deepmind/rlax/blob/383f93bc8b33c3d1bc28f15e1e07fc5104c790ea/rlax/_src/mul…
-
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
https://talkingaboutme.tistory.com/entry/RL-Policy-Gradient-Algorithms
https://www.telesens.co/2019/04/21/understa…
-
example Box-Cox transformation with unknown parameter
reference (that I just found again)
A. Scallan, R. Gilchrist, M. Green "Fitting parametric link functions in generalised linear models"
http://…
-
Another ancient theme with nothing public in statsmodels.
a brief github search with python repos for extreme value
https://github.com/wafo-project/pywafo package by Per A. Brodtkorb but GPL
http…
-
Regression (_e.g._ linear regression, logistic regression, poisson regression, etc) is a very important in machine learning. Many problems can be formulated in the form of (regularized) regression.
…
-
(this is triggered by some readings on dispersion estimation as followup to Tweedie #2858 #2872)
the question is how do we estimate dispersion parameters or data (exog not mean) varying variance func…
-
I have been trying to implement a PPO Agent that solves LunarLander-v2 as in the official example in the github repo:
https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/examples/v2…
-
a bit similar to the idea of automatic forecasting: Find best fitting distributional assumption in MLE models.
main advantages compared to users doing it themselves:
- predefined sequence, autom…