[KTLO-5] batch size larger than data size

UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Apache License 2.0

324 stars 37 forks source link

[KTLO-5] batch size larger than data size #28

Closed kwang2049 closed 2 years ago

kwang2049 commented 2 years ago

The previous code did not check whether the batch size is larger than the number of data points (or number of generated queries) in PseudoLabeler.run

pl/toolkit/pl.py: Added check at the beginning of run about batch size vs data size
tests/unit/test_pl.py: Added test