-
Hi, I'd like to have a few questions on the workflow combining [llm-recipes](https://github.com/Nicolas-BZRD/llm-recipes) with [llm-distillation](https://github.com/Nicolas-BZRD/llm-distillation) to c…
-
Hi developers,
I'm trying to train Hfnet. During distillation, the exported npz files exceeded the 100 Gb in 20 minutes. As a result the process stopped because no space left on my device. My data…
-
Does the training data same as the aesthetic_6plus from huggingface (https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus)。 At the same time, can it affect the final distilla…
-
### Feature request
I am currently working on a project that involves sequence level distillation across multiple domains, requiring the handling of separate datasets for each domain within a single …
-
### My Issue:
1. No matter what value I set for the **--save_steps** parameter, the system always saves the checkpoint after exactly 500 steps.
2. No matter what value I set for the **--save_total_l…
-
Hi! Thank you for the amazing work with including registers and including their checkpoints. I was trying to reproduce the results from Table 3 in "Vision Transformers Need Registers" paper: LOST unsu…
-
Hello, when I saw fine-tuning the bge-m3 model, I could use distillation. How do I make a distillation dataset?
-
https://virtual2023.aclweb.org/paper_P5706.html
-
The HPLT dataset includes a fluency score. We should look at filtering our own data by this fluency metric, and see if it improves.
https://hplt-project.org/datasets/v1.2
I assume this would be …
-
Thank you for sharing the excellent code and checkpoints! I have run the code described in `Readme.md` and would like to determine whether I correctly understood them.
The current version of `dist…