ServiceNow / WorkArena

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
https://servicenow.github.io/WorkArena/
Other
129 stars 13 forks source link

Maximize investment return stasks not ending on wrong answer #25

Open ThibaultLSDC opened 2 months ago

ThibaultLSDC commented 2 months ago

Max return investment tasks allow wrong answers without ending the episode. cf https://github.com/ServiceNow/WorkArena/blob/b11908298f2e870a18e4a25cfa972ccd33cac3de/src/browsergym/workarena/tasks/compositional/maximize_investment_return.py#L240-L245 Is this intended ? @jardinetsouffleton

aldro61 commented 2 months ago

I think this is the expected behavior. We only terminate a task with a failure if incorrect information or modifications are pushed to the database (eg, deleting the wrong record).