UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
324 stars 37 forks source link

Error while running the training script #4

Closed kingafy closed 2 years ago

kingafy commented 2 years ago

2022-04-14 06:00:25] INFO [gpl.toolkit.pl.run:60] Begin pseudo labeling 0%| | 0/140000 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/ec2-user/SageMaker/gpl/gpl/toolkit/pl.py", line 63, in run batch = next(hard_negative_iterator) File "/home/ec2-user/SageMaker/kernels/gpl_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/ec2-user/SageMaker/kernels/gpl_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 569, in _next_data index = self._next_index() # may raise StopIteration File "/home/ec2-user/SageMaker/kernels/gpl_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in _next_index return next(self._sampler_iter) # may raise StopIteration StopIteration

kwang2049 commented 2 years ago

Hi @kingafy, could you please try running this sample script: https://github.com/UKPLab/gpl/blob/main/sample-data/sample-data.sh and tell me whether you still have the error?

kwang2049 commented 2 years ago

I found this error might be due to corpus size < batch_size_gpl. Could you please check whether this is the case? I will also add this assertion in the new version of the code.

kwang2049 commented 2 years ago

Feel free to open it again if you still encounter such an issue.