Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
247 stars 81 forks source link

best_batch selection from sims #8

Closed mangushev closed 5 years ago

mangushev commented 5 years ago

Hi, sap = sims[ii][pinds] line 176 in select_batch.py ii is an index to speakers, sims rows are embeddings of selected speakers. It seems to me that row selected from sims does not correspond to anchor_index (that specific embedding for speaker) Could you clarify? Thanks!

Walleclipse commented 5 years ago

Hi, according to line 161 ~ 176 in select_batch.py,

  1. I choose a specific speaker in line 163 , ii is the index of anchor speakers speaker = anh_speakers[ii] //,i.e. ii=1,
  2. I get all indices of history embedding for specific speaker , in line 167. inds = anchs_index_dict[speaker] // anchs_index_dict: key -> speaker_id, value -> all indices to speaker utter's embedding in history //embedding table
  3. In line 168~171, I select the same speakers utterance embedding indices as candidate positive samples denoted as pids.(except the same utterance to anchor, which means same speaker but different utters as positive one) pinds.append(inds[jj]) inds is indices for utterance embedding for same speaker with anchor. I just select the positive samples in it.
  4. As line 176, I just calculate the similarity between anchor speakers ii and positive samples pinds (same speaker) sap = sims[ii][pinds]
mangushev commented 5 years ago

as for point 4: my understanding that we select specific row in simps in first dim and use positives and negatives in second dim. Does ii actually correspond to the selected anchor_index? It seems that anchor_index need to be mapped into speaker_embeds index and used for positioning in sims first dim

Walleclipse commented 5 years ago

Yes, you are right, please check the line 149~163. ii is the anchor index in anh_speakers list, and first dim of sims[ii][pinds] denoted the same anchor index (because sims was calculated between anchor candidates with total utters. So, for the specific ii , sims[ii] is corresponding to anh_speakers[ii] )