SkyworkAI / Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
Other
1.21k stars 111 forks source link

泄露检测的ref集合问题 #61

Closed dongZheX closed 9 months ago

dongZheX commented 9 months ago

请问,论文中用于泄漏检测的ref集合,是否是data/eval_loss中的mock_gsm8k数据集?

感谢!

TianwenWei commented 9 months ago

全量的检测集合在huggingface上,详见readme.md