askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
352 stars 77 forks source link

how to calculate the task success #143

Closed Deaddawn closed 11 months ago

Deaddawn commented 11 months ago

Hi, there. How do I calculate the task success or other metrics based on a test dataset results?

Deaddawn commented 11 months ago

Or how do i evaluate on valid datasets with getting a result of metrics(eg SR)?