Skyy93 / Sample4Geo

76 stars 10 forks source link

About break_counter in dataset shuffle #18

Closed HaoDot closed 2 months ago

HaoDot commented 2 months ago

Thanks for your implementation. I would like to ask about break_counter. What does this variable mean? And why do different datasets have different number for ending the loop of shuffle for avoiding same id in one batch? Thanks for your time.

Skyy93 commented 2 months ago

We use the break counter variable to ensure no duplicates are within a batch. This is due to the construction of our sim_dict that contains for each training sample a list of ids that are considered as hard negatives. But the ids are not mutual exclusive thus we must ensure we do not sample them twice.