off-policy-evaluation Search Results

tensorflow/agents #791

Contextual Bandit Off-Policy Evaluation

Hi, I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the…

vitorkrasniqi updated 1 year ago

heigvd-teaching-tools/online-test-platform #307

WebSocket for live status updates

To improve network efficiency and reduce consumption, we propose replacing certains of current active polling method (periodic HTTP requests) with a single WebSocket connection for each active user. T…

Ovich updated 2 months ago

eugeneyan/eugeneyan-comments #67

https://eugeneyan.com/writing/bandits/

# Bandits for Recommender Systems Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more. [https://eugeneyan.com/writing/bandits/](https://eugeneyan.com/writing/ba…

utterances-bot updated 5 months ago

VowpalWabbit/coba #44

Question - Off Policy Eval Without Propensities

Thank you for this useful repo! I have a question, lets say there is logged data you want to use to train and evaluate a new policy. The logged data is something like where the context features descr…

AllardJM updated 2 weeks ago

HACMan-2023/HACMan #2

Issue when testing with `simple_env`

Hi! When I tested the installation with simple_env, I got the following error: ``` File "scripts/run.py", line 120, in model.learn(total_timesteps=1000000, log_interval=10, callback=ca…

zichunxx updated 1 month ago

huggingface/lerobot #341

question: expected performance of vq-bet?

Hi, Thank you to the LeRobot community for maintaining such a fantastic codebase. My research group and I have greatly benefited from your efforts. In my current project, I am using the repository …

Jubayer-Hamid updated 3 weeks ago

AkihikoWatanabe/paper_notes #1367

NewsPicksに推薦システムを本番投入する上で一番優先すべきだったこと, 2024.08

https://tech.uzabase.com/entry/2024/08/29/161828

AkihikoWatanabe updated 6 days ago

arXivTimes/arXivTimes #1066

Deep Reinforcement Learning for Vision-Based Robotic Graspin…

## 一言でいうとロボットアームで物をつかむタスクについて、シンプルなものも含め様々なアルゴリズムでパフォーマンスを計測した研究。結果として、素のMonte Carlo法でもDDPGと同程度/タスクによっては超えるパフォーマンスが得られることを確認。 ### 論文リンク https://arxiv.org/abs/1802.10264 ### 著者/所属機関 Dei…

icoxfog417 updated 5 years ago

google-research/deep_ope #1

Hyperparameters for the baseline results

Hello, I believe this is the github repo for the paper "Benchmarks for Deep Off-Policy Evaluation". Do you have any plans to release the **hyperparameters & setups** used for baselines results? …

shlee94 updated 1 year ago

AkihikoWatanabe/paper_notes #339

Off Policy Evaluation の基礎とOpen Bandit Dataset & Pipelineの紹介,…

https://speakerdeck.com/usaito/off-policy-evaluationfalseji-chu-toopen-bandit-dataset-and-pipelinefalseshao-jie

AkihikoWatanabe updated 4 years ago

1000+ results for off-policy-evaluation

1000+ results
for off-policy-evaluation