Cardinal-Operations / ORLM

ORLM: Training Large Language Models for Optimization Modeling
https://arxiv.org/abs/2405.17743
Apache License 2.0
66 stars 11 forks source link

Issues with IndustryOR Benchmark #3

Open AuroraLHL opened 1 week ago

AuroraLHL commented 1 week ago

Hello! I've noticed several issues with the IndustryOR benchmark you released. For example, in some instances, the parameters are not provided. How are the optimal values determined without actual parameters?

Additionally, there are instances with incorrect solutions and unclear problem statements.

Could you please clarify how you collected this benchmark? Was there no manual verification before using it as a benchmark?

AuroraLHL commented 1 week ago
截屏2024-10-10 21 21 02
CyrilHuangZ commented 5 days ago

Hi there! Thank you so much for pointing out these issues. We truly appreciate your feedback! We've recognized some problems in this benchmark and are currently working on a thorough review. A new version will be released soon.

The benchmark stems primarily from three sources: part of the content comes from textbook exercises, another part from well-known mathematical modeling competitions, and the rest from real-world operations research challenges faced by Cardinal Operations. We've made modifications to these problems to protect client privacy and ensure they fit within the window length limits of large language models. Additionally, many of the original problems and datasets were in Chinese, and we used AI translation to make them accessible to a broader scholar. This translation step may have contributed to the issues as well.

Once again, thank you for your valuable attention and feedback! We're committed to refining the English version to improve its accuracy and will release the updated version soon. Stay tuned!