off-policy-evaluation Search Results

1000+ results
for off-policy-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

google-research/deep_ope #1

Hyperparameters for the baseline results

Hello, I believe this is the github repo for the paper "Benchmarks for Deep Off-Policy Evaluation". Do you have any plans to release the **hyperparameters & setups** used for baselines results? …

shlee94 updated 1 year ago
1
hobotrl/hobotrl #18

New argument proposal for act()

# 1. policies should accept a `batch` argument for batch processing. If the policy can implement a batch version of act(), significant speed up can be obtained thanks to less forward passes for the…

zaxliu updated 7 years ago
3
cisagov/ScubaGear #451

Add a -Strict flag to toggle evaluating Policy exclusions

# 💡 Summary # This is flag is for use cases when the ScubaGear operator is unable to inspect the exclusions of the conditional access policies of the tenant being assessed and add those exclusions …

buidav updated 1 year ago
1
st-tech/zr-obp #178

propensity score estimate

Hello, In the input dataset, propensity scores needs to be provided, does the the propensity score needs to be calibrated ? How is the impact of wrong propensity score ? (vs reward level...) …

arita37 updated 2 years ago
2
pytorch/rl #875

[BUG] NoisyLinear uses self.training to turn on/off noise

## Describe the bug https://github.com/pytorch/rl/blob/2fd46326e2eb8d155258feed81dbce1574f527f0/torchrl/modules/models/exploration.py#L131 In PyTorch, `module.training` is used to differentiate …

seermer updated 1 year ago
2
gnolang/gno #519

Evaluation DAO, Decentralist DAO, and GNO chain Governance P…

# Evaluation DAO, Decentralist DAO, and GNO chain Governance ## A simplified and evolving approach. ## Part 1/3 ## Summary: The proposal is to help clarify the requirement and use cases …

piux2 updated 2 weeks ago
6
VowpalWabbit/coba #44

Question - Off Policy Eval Without Propensities

Thank you for this useful repo! I have a question, lets say there is logged data you want to use to train and evaluate a new policy. The logged data is something like where the context features descr…

AllardJM updated 1 month ago
6
adityab/CrossQ #6

Some tasks from deepmind/* not working

Hello, I am trying to benchmark your code on more tasks from deepmind/* but they are not working. There seems to be a bug in the `prepare_obs` function in `sbx/common/policies.py`. I attach stack tra…

JankowskiChristopher updated 6 months ago
2
ray-project/ray #29013

RLlib off policy estimator metrics not captures by Tuner cor…

### What happened + What you expected to happen I think I found a bug. It seems that when there are many different trials the ray Tuner object cannot find the correct metrics. If you look below the…

jeweinb updated 1 year ago
2
Wunder2dream/RL #1

Model free RL

- **Model-Free vs Model-Based RL** >1)**Model-based** algorithmis an algorithm that uses _the transition function_ (and _the reward function_) in order to estimate the optimal policy. > The age…

Wunder2dream updated 4 years ago
7

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for off-policy-evaluation

1000+ results
for off-policy-evaluation