black font - Githubissues

bkpcoding commented 3 months ago

Description

black font on black background makes it impossible to read.

(Optional:) Please add any files, screenshots, or other information here.

No response

(Required) What is this issue most closely related to? Select one.

Choose One

Internal issue ID

2f14f2ae-697f-4aa2-bb7c-90f4ecbb9ea9

Paper URL

https://arxiv.org/html/2402.14194v1

Browser

Chrome/126.0.0.0

Device Type

Desktop

html-feedback-bot[bot] commented 3 months ago

Location in document: undefined

Selected HTML:

III Preliminaries

III-A Problem Statement

We model the learning task as a Markov Process (MP) defined by $\{\mathcal{S},\mathcal{A},T\}$ of states $s\in\mathcal{S}$ , actions $a\in\mathcal{A}$ , and the transition probability $T(s_{t},a_{t},s_{t+1}):\mathcal{S}\times\mathcal{A}\times\mathcal{S}\mapsto[0,1]$ . Note that, unlike RL, we do not have access to an environment reward. We have access to human expert demonstrations in the training environment consisting of a set of trajectories, ${D_{E}}=(\tau_{0}^{E},\tau_{1}^{E}...,\tau_{M}^{E})$ , of states and actions at every time step $\tau=(s_{t},a_{t},...)$ . The underlying expert policy, $\pi_{E}$ , is unknown. The goal is to learn the agent’s policy $\pi$ , that best approximates the expert policy $\pi_{E}$ . Each expert trajectory $\tau^{E}$ , consists of states and action pairs:

\tau^{E}=(s_{0},a_{0},s_{1},a_{1},\ldots,s_{N},a_{N}).

(1)

The human decision-making process of the expert is unknown and likely non-Markovian [32]; thus, imitation learning performance can deteriorate with human trajectories [6].

III-B Unimodal Decision Transformer

The Behavior Transformer (BeT) processes the trajectory $\tau_{E}$ as a sequence of 2 types of inputs: states and actions. The original BeT implementation [11] employed a mixture of Gaussians to model a dataset with multimodal behavior. For simplicity and to reduce the computational burden, we instead use an unimodal BeT that uses a deterministic similar to the one originally used by the Decision Transformer [7]. However, since residual policies can be added to black-box policies [14], BeTAIL’s residual policy could be easily added to the k-modes present in the original BeT implementation.

github-actions[bot] commented 3 months ago

Hello @bkpcoding, thanks for the issue report! We are reviewing your report and will address it as soon as possible.

arXiv / html_feedback

black font #1750