Open Jackeylove1103 opened 1 month ago
Location in document: S3.SS1.p2.6
Selected HTML:
For a given natural language task, we can define a reward function as the task performance feedback for intermediate generation <math alttext="y{t}" class="ltxMath" display="inline" id="S3.SS1.p2.2.m2.1" data-immersive-translate-walked="a3a22be5-a722-4af1-9e2e-c5a40b6ffc9a">
Given the problem formulation above, we successfully transfer the problem of better generation to optimization for higher cumulative reward. In this paper, we focus on how we can optimize it with tree-search algorithms. A specific natural language task typically predefines the state space (with language) and reward function (with task objective/metrics). What remains is the definition of action space, or in the context of tree-search algorithm, the action node.
Hello @Jackeylove1103, thanks for the issue report! We are reviewing your report and will address it as soon as possible.
@Jackeylove1103 could you find out why your translation tool fails here and tell us the technical reason?
As it stands it is not clear if arXiv contributed to that effect directly.
Description
no translation here
(Optional:) Please add any files, screenshots, or other information here.
No response
(Required) What is this issue most closely related to? Select one.
Choose One
Internal issue ID
04f74a44-fb07-45ea-8470-f2204e1c9d03
Paper URL
https://arxiv.org/html/2309.17179?_immersive_translate_auto_translate=1
Browser
Chrome/129.0.0.0
Device Type
Desktop