Closed HaoDot closed 2 months ago
We use the break counter variable to ensure no duplicates are within a batch. This is due to the construction of our sim_dict that contains for each training sample a list of ids that are considered as hard negatives. But the ids are not mutual exclusive thus we must ensure we do not sample them twice.
Thanks for your implementation. I would like to ask about
break_counter
. What does this variable mean? And why do different datasets have different number for ending the loop of shuffle for avoiding same id in one batch? Thanks for your time.