How did you extract pitch information to perform normalized cross-correlation evaluation in the paper?

I know that you've used librosa pip-tracker to extract pitch but I'm confused by how to use this to get pitch information. pitches, magnitudes = librosa.piptrack(y=y, sr=sr, fmin=0, fmax=800) This function will return two arrays: pitches and magnitudes. magnitudes[f, t] contains the magnitude of bin f at time t and pitches[f, t] contains the instantaneous frequency of bin f at time t. Is it right to take the maximum magnitude frequency bin as the pitch? That is:

bins = np.argmax(magnitudes, axis=0)
p = [pitches[bins[t], t] for t in range(pitches.shape[1])]
p = np.array(p)

Or take the lowest frequency bin as the pitch(F0)?

pitches[pitches == 0] = np.inf
p = pitches.min(axis=0)
p[p == np.inf] = 0

facebookresearch / music-translation

How did you extract pitch information to perform normalized cross-correlation evaluation in the paper? #3