bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.25k stars 516 forks source link

Question about training data. #90

Open suoych opened 1 year ago

suoych commented 1 year ago

Hi, thanks for sharing the great work! May I ask that where you get the PDDL(Planning Domain Definition Language) data? I run the demo on huggingface and found that starcoder has the ability to write the pddl code. However, I did not find pddl language data in language list of The Stack dataset. Could you shed light on how to acquire pddl data? Thank you so much.

ArmelRandy commented 1 year ago

Hi. StarCoder was not explicitly trained on PDDL. However, the model could have encountered its syntax in the markdown/HTML files that are part of the training dataset.