NUS-HPC-AI-Lab / InfoBatch

Lossless Training Speed Up by Unbiased Dynamic Data Pruning
318 stars 18 forks source link

user defined sampler #15

Closed pengzhangzhi closed 10 months ago

pengzhangzhi commented 10 months ago

Hi, the InfoBatch wraps the dataset and generates a sampler for dataloader. What if I have a pre-defined sampler? How to mitigate my sampler to the InfoBatch's sampler?

tiandunx commented 10 months ago

The essence of this method is to achieve lossless acceleration by randomly discarding well-learned samples and rescaling the gradients of the discarded samples. Therefore, you can customize the implementation based on the ideas presented in the paper. Feel free to reopen the issue.

henryqin1997 commented 10 months ago

PS: you can refer to the research/ directory for a more straightforward implementation of the code. If you are using distributed training, there are some more points to take care of. If you have further questions, you can post them in detail.