Is this metric intentially implemented this way?

OSU-NLP-Group / TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"

https://osu-nlp-group.github.io/TravelPlanner/

MIT License

215 stars 27 forks source link

Is this metric intentially implemented this way? #27

Closed Remper closed 1 month ago

Remper commented 1 month ago

The is_not_absent constraint will fail if almost any field is not filled. From the code it seems that the intention was to fail if less than half of the plan is filled but this is not how it's implemented. This constraint also affect how hard constraints are evaluated – if this one doesn't pass, the hard constraints won't pass either. Is this intentional or is it a bug?

https://github.com/OSU-NLP-Group/TravelPlanner/blob/90a786d4c5a660aa8ec583dfd40b4d6b058755c8/evaluation/commonsense_constraint.py#L480

hsaest commented 1 month ago

Hi,

Thank you for your interest in our work.

We designed this intentionally. Actually, the test file can be passed as long as it has specific fields like ‘transportation,’ even if it is filled with a ’-’, which means there is no exact value. This might indicate that no transportation is needed on that day. This design ensures we can have an accurate evaluation of every test file.

Hope this can address your problem. Feel free to contact us if you have further questions.

Best, Jian