Closed Naman-ntc closed 9 months ago
Thanks for your interest. The cutoff date of deepseek coder models is March 2023.
Thanks! Is it for the base models or also for the instruct models?
Both.
Hi, we have found potential data contamination in leetcode problems released in May-July. Could instruction tuning lead to a later cutoff date? We specifically measure the performance of deepseek on leetcode problems over months and observe a sharp dip in performance after July/August. DeepSeek still outperforms various closed models (on problems released after August) but I wanted to get some clarity on this behavior.
Hi DeepSeek team, Thank you for releasing the amazing DeepSeek models. I am working on LLM evaluations and they lead open-source models (and even quite a few closed-source models)
While I try to construct problems from recently released content (leetcode, github) I wanted to check with you if there are any official cutoff dates claimed for the model. I also realize cutoff dates might vary for the data sources (competition websites, github) possibly arising from pre-training vs instruction tuning gap, and would love to get some clarity on this regard!
Finally, I also wanted to check if there are any plans for releasing more details about the training dataset and sources at some point in a technical report!