-
Hello,
I believe this is the github repo for the paper "Benchmarks for Deep Off-Policy Evaluation".
Do you have any plans to release the **hyperparameters & setups** used for baselines results?
…
-
# 1. policies should accept a `batch` argument for batch processing.
If the policy can implement a batch version of act(), significant speed up can be obtained thanks to less forward passes for the…
-
# 💡 Summary #
This is flag is for use cases when the ScubaGear operator is unable to inspect the exclusions of the conditional access policies of the tenant being assessed and add those exclusions …
-
Hello,
In the input dataset, propensity scores needs to be provided,
does the the propensity score needs to be calibrated ?
How is the impact of wrong propensity score ? (vs reward level...)
…
-
## Describe the bug
https://github.com/pytorch/rl/blob/2fd46326e2eb8d155258feed81dbce1574f527f0/torchrl/modules/models/exploration.py#L131
In PyTorch, `module.training` is used to differentiate …
-
# Evaluation DAO, Decentralist DAO, and GNO chain Governance
## A simplified and evolving approach.
## Part 1/3
## Summary:
The proposal is to help clarify the requirement and use cases …
piux2 updated
2 weeks ago
-
Thank you for this useful repo! I have a question, lets say there is logged data you want to use to train and evaluate a new policy. The logged data is something like where the context features descr…
-
Hello,
I am trying to benchmark your code on more tasks from deepmind/* but they are not working. There seems to be a bug in the `prepare_obs` function in `sbx/common/policies.py`. I attach stack tra…
-
### What happened + What you expected to happen
I think I found a bug. It seems that when there are many different trials the ray Tuner object cannot find the correct metrics. If you look below the…
-
- **Model-Free vs Model-Based RL**
>1)**Model-based** algorithmis an algorithm that uses _the transition function_ (and _the reward function_) in order to estimate the optimal policy.
> The age…