Closed RyanHuangNLP closed 5 months ago
Thank you very much for your excellent open-source work. Can you provide more details about the DMD training? 1.how much training data was used and how much GPU time was required? 2.why not train dmd on sigma, is it 1024 model hard to distill?
Thank you very much for your excellent open-source work. Can you provide more details about the DMD training? 1.how much training data was used and how much GPU time was required? 2.why not train dmd on sigma, is it 1024 model hard to distill?