OSU-NLP-Group / TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
https://osu-nlp-group.github.io/TravelPlanner/
MIT License
215 stars 27 forks source link

[question] testset evaluation submission #20

Closed yananchen1989 closed 5 months ago

yananchen1989 commented 5 months ago

hi authors.

i see that in https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard Format of Submission: {"idx":0,"query":"Natural Language Query","plan":[{"day": 1, "current_city": "from [City A] to [City B]", "transportation": "Flight Number: XXX, from A to B", "breakfast": "Name, City", "attraction": "Name, City;Name, City;...;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, {"day": 2, "current_city": "City B", "transportation": "-", "breakfast": "Name, City", "attraction": "Name, City;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, ...]} where in the plan, there are "day". however, I see that in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/evaluation/eval.py#L88, the key should be "days" ? I also see in https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/postprocess/openai_request.py , in the json format, they are "days" instead of "day".

May I know does this affect anything about evaluation ? please advise. thanks.

yananchen1989 commented 5 months ago

besides, does the leaderboard support submission testing ? I mean just uploading several lines (rather than 1000) to have a format check. Also, does it support shuffled order ?

hsaest commented 5 months ago

Hi Yanan,

It would not affect the evaluation since 'day' or 'days' is only used as an index.

Sorry, we still do not support the partial set test or shuffled order test. Maybe we will support these in the future, but now we have too much todo.

yananchen1989 commented 5 months ago

thanks a lot.