arXiv / html_feedback

Supports a student project developing a UI for feedback on arXiv articles rendered as html.
MIT License
18 stars 3 forks source link

black font #1750

Open bkpcoding opened 3 months ago

bkpcoding commented 3 months ago

Description

black font on black background makes it impossible to read.

(Optional:) Please add any files, screenshots, or other information here.

No response

(Required) What is this issue most closely related to? Select one.

Choose One

Internal issue ID

2f14f2ae-697f-4aa2-bb7c-90f4ecbb9ea9

Paper URL

https://arxiv.org/html/2402.14194v1

Browser

Chrome/126.0.0.0

Device Type

Desktop

html-feedback-bot[bot] commented 3 months ago

Location in document: undefined

Selected HTML:

III Preliminaries

III-A Problem Statement

We model the learning task as a Markov Process (MP) defined by {๐’ฎ,๐’œ,T}๐’ฎ๐’œ๐‘‡\{\mathcal{S},\mathcal{A},T\}{ caligraphic_S , caligraphic_A , italic_T } of states sโˆˆ๐’ฎ๐‘ ๐’ฎs\in\mathcal{S}italic_s โˆˆ caligraphic_S, actions aโˆˆ๐’œ๐‘Ž๐’œa\in\mathcal{A}italic_a โˆˆ caligraphic_A, and the transition probability Tโข(st,at,st+1):๐’ฎร—๐’œร—๐’ฎโ†ฆ[0,1]:๐‘‡subscript๐‘ ๐‘กsubscript๐‘Ž๐‘กsubscript๐‘ ๐‘ก1maps-to๐’ฎ๐’œ๐’ฎ01T(s_{t},a_{t},s_{t+1}):\mathcal{S}\times\mathcal{A}\times\mathcal{S}\mapsto[0,1]italic_T ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) : caligraphic_S ร— caligraphic_A ร— caligraphic_S โ†ฆ [ 0 , 1 ]. Note that, unlike RL, we do not have access to an environment reward. We have access to human expert demonstrations in the training environment consisting of a set of trajectories, DE=(ฯ„0E,ฯ„1Eโขโ€ฆ,ฯ„ME)subscript๐ท๐ธsuperscriptsubscript๐œ0๐ธsuperscriptsubscript๐œ1๐ธโ€ฆsuperscriptsubscript๐œ๐‘€๐ธ{D_{E}}=(\tau_{0}^{E},\tau_{1}^{E}...,\tau_{M}^{E})italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = ( italic_ฯ„ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT , italic_ฯ„ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT โ€ฆ , italic_ฯ„ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT ), of states and actions at every time step ฯ„=(st,at,โ€ฆ)๐œsubscript๐‘ ๐‘กsubscript๐‘Ž๐‘กโ€ฆ\tau=(s_{t},a_{t},...)italic_ฯ„ = ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , โ€ฆ ). The underlying expert policy, ฯ€Esubscript๐œ‹๐ธ\pi_{E}italic_ฯ€ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT, is unknown. The goal is to learn the agentโ€™s policy ฯ€๐œ‹\piitalic_ฯ€, that best approximates the expert policy ฯ€Esubscript๐œ‹๐ธ\pi_{E}italic_ฯ€ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. Each expert trajectory ฯ„Esuperscript๐œ๐ธ\tau^{E}italic_ฯ„ start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT, consists of states and action pairs:

ฯ„E=(s0,a0,s1,a1,โ€ฆ,sN,aN).superscript๐œ๐ธsubscript๐‘ 0subscript๐‘Ž0subscript๐‘ 1subscript๐‘Ž1โ€ฆsubscript๐‘ ๐‘subscript๐‘Ž๐‘\tau^{E}=(s_{0},a_{0},s_{1},a_{1},\ldots,s_{N},a_{N}).italic_ฯ„ start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT = ( italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , โ€ฆ , italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) . (1)

The human decision-making process of the expert is unknown and likely non-Markovian [32]; thus, imitation learning performance can deteriorate with human trajectories [6].

III-B Unimodal Decision Transformer

The Behavior Transformer (BeT) processes the trajectory ฯ„Esubscript๐œ๐ธ\tau_{E}italic_ฯ„ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT as a sequence of 2 types of inputs: states and actions. The original BeT implementation [11] employed a mixture of Gaussians to model a dataset with multimodal behavior. For simplicity and to reduce the computational burden, we instead use an unimodal BeT that uses a deterministic similar to the one originally used by the Decision Transformer [7]. However, since residual policies can be added to black-box policies [14], BeTAILโ€™s residual policy could be easily added to the k-modes present in the original BeT implementation.

github-actions[bot] commented 3 months ago

Hello @bkpcoding, thanks for the issue report! We are reviewing your report and will address it as soon as possible.