Process of collecting the dataset from publicly available information

OSU-NLP-Group / TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"

https://osu-nlp-group.github.io/TravelPlanner/

MIT License

215 stars 27 forks source link

Process of collecting the dataset from publicly available information #29

Closed Soumyabrata2003 closed 2 weeks ago

Soumyabrata2003 commented 3 weeks ago

Hi, First of all, great work ! I just wanted to know how you are collecting the data files like attractions.csv or clean_flights_2022.csv etc. Our group wants to curate such a dataset for a different country, so we were wondering which websites or how you are procuring the data.

hsaest commented 3 weeks ago

Hi,

Thanks for your interest in our work.

You could refer to Appendix A.3 of our paper, where we provide the details.

Feel free to contact us if you have further questions.

Best, Jian

Soumyabrata2003 commented 3 days ago

Hi, Just another thing: How do you make the dataset for TravelPlanner's train/ val or test set? Do you manually make the natural language queries or get it from some website? And, do you generate the columns from the of the dataset by parsing the natural language query using GPT-4?

For train set, there is an "annotated_plan" column which is done manually as per your website. What is the use of "reference_information" column and how do you get that?