Mismatch features in osunlp/TravelPlanner Dataset

OSU-NLP-Group / TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"

https://osu-nlp-group.github.io/TravelPlanner/

MIT License

215 stars 27 forks source link

Mismatch features in osunlp/TravelPlanner Dataset #11

Closed lzl65825 closed 6 months ago

lzl65825 commented 6 months ago

Train dataset features:

['org', 'dest', 'days', 'visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget', 'query', 'level', 'annotated_plan', 'reference_information']

Validation dataset features:

['org', 'dest', 'days', 'visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget', 'query', 'level', 'reference_information']

Test dataset features:

['days', 'level', 'query', 'reference_information']

The features are different among different datasets. There are missing features in the test dataset, so the greedy search codes cannot be run directly.

hsaest commented 6 months ago

Hi,

Thank you for your interest in our work. We have updated the test set in the huggingface and related code. The greedy search code works now. Just fetch the newest code and try it again!

We assigned different features among different datasets due to the different uses of datasets. Some features are hidden to avoid data contamination.

Feel free to contact us if you have further questions.

Best, Jian

lzl65825 commented 6 months ago

Hello,

Thank you for a quick update. I noticed the features of the test dataset now is

['org', 'dest', 'days', 'date', 'query', 'level', 'reference_information']

It seems that

['visiting_city_number', 'date', 'people_number', 'local_constraint', 'budget']

are missing. Do you know if it is on purpose? I understand the evaluation codes will be uploaded later, but I found that evaluation/eval.py Line 105 uses 'local_constraint' as a key to fetch the information for evaluation. Thus, can I assume this information should be included in the final evaluation codes?

hsaest commented 6 months ago

Hi @lzl65825 ,

We have updated the evaluation code just now. :)

We do conceal these features on purpose since we hope to maintain a fair evaluation process without any data contamination and potential cheating on test set. So, we support the offline evaluation of the validation set and provide the online evaluation of the test set on our leaderboard .