OSU-NLP-Group / TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
https://osu-nlp-group.github.io/TravelPlanner/
MIT License
215 stars 27 forks source link

Test Dataset evaluation error on leaderboard site #23

Closed lihkinVerma closed 2 months ago

lihkinVerma commented 2 months ago

Hi Team, I have been trying to test generated plans for 'sole-planning' on the leaderboard at https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard

I checked that the format of file is correct by first testing the file with postprocess/format_check.py file.

But unfotunately, the testing is not working for me. What is the other way to obtain the test results. If you have access to logs of files being tested on the huggingface platform, so the file I have tried uplaoding for test for sole-planning is "test_Llama-3-8B_direct_sole-planning_parsed_by_gemini-1.5-flash_submission_decoded.jsonl" or I can provide you the file for testing to obtain results.

I hope you can help with this issue asap.

hsaest commented 2 months ago

Hi Nikhil,

According to the logs, seems that the "flight" in instance 274 has a format error. Could you please check this?

You can also send your submission file to this email address. I will review it and tell you the detailed errors. Besides, this can help me run it through our format check tool to identify any potential bugs.

Best, Jian

lihkinVerma commented 2 months ago

Thankyou Authors; for the help. Getting in touch and having a keen eye at all the test case reponse was actually meaningful. Really appeciate the help.

Regards, Nikhil Verma