Closed pengzhangzhi closed 10 months ago
The essence of this method is to achieve lossless acceleration by randomly discarding well-learned samples and rescaling the gradients of the discarded samples. Therefore, you can customize the implementation based on the ideas presented in the paper. Feel free to reopen the issue.
PS: you can refer to the research/ directory for a more straightforward implementation of the code. If you are using distributed training, there are some more points to take care of. If you have further questions, you can post them in detail.
Hi, the InfoBatch wraps the dataset and generates a sampler for dataloader. What if I have a pre-defined sampler? How to mitigate my sampler to the InfoBatch's sampler?