python issue setting good_units_only = False

charlesdgburns commented 9 hours ago

Was curious to see how unit match would run across all units. I'm running code as is set up in the jupyter notebook.

#read in data and select the good units and exact metadata
waveform, session_id, session_switch, within_session, good_units, param = util.load_good_waveforms(wave_paths, unit_label_paths, param, good_units_only = False)

and I run into a slightly obscure error at the stage of

extracted_wave_properties = ov.extract_parameters(waveform, channel_pos, clus_info, param)

error msg:

File /.conda/envs/maze_ephys/lib/python3.12/site-packages/UnitMatchPy/metric_functions.py:399, in get_threshold(total_score, within_session, euclid_dist, param, is_first_pass)
    [396]([...]/UnitMatchPy/metric_functions.py:396) hp, __ = np.histogram(tmp, Bins)
    [397](.../UnitMatchPy/metric_functions.py:397) hp = hp / np.nansum(tmp)
--> [399]../UnitMatchPy/metric_functions.py:399) thrs_opt = score_vector[np.argwhere( (pf.smooth(hd,3) > pf.smooth(hnd,3) ) * (score_vector > 0.6) == True)][0]
    [400].../UnitMatchPy/metric_functions.py:400) # if ThrsOpt.size == 0:
    [401](.../UnitMatchPy/metric_functions.py:401) #     ThrsOpt = 0.6 # give default threshold if above doestn return value
    [402](.../UnitMatchPy/metric_functions.py:402) # fit tmp to a normal ditn
    [403](.../UnitMatchPy/metric_functions.py:403) fit = tmp[ ~np.isnan(tmp) * (tmp < thrs_opt)]
IndexError: index 0 is out of bounds for axis 0 with size 0

So far I've identified a problem with the output of uitl.load_good_waveforms when good_units_only = False.

It's clear for average waveforms after running ov.extract_parameters(), as I only get two different waveforms across all clusters (when good_units_only = False):

avg_wave = extracted_wave_properties['avg_waveform'] 
print(avg_wave.shape)
plt.plot(avg_wave[:,:,0])

but when good_units_only = True, I get a different average waveform for each cluster:

EnnyvanBeest commented 8 hours ago

@Sam-Dodgson I think there may be a bug here: it seems it still loads in only the good units despite bool=false? Can you double check please?

To OP: It will most likely not work well including all units including noise units, as the data within day will be too unreliable to find a proper threshold for reliable matches. So even when the waveform bug is fixed, you may find that same error message. So using UM this way is not recommended.

charlesdgburns commented 8 hours ago

Thanks for the quick response!

Regarding the inclusion of noise units - I also realised the point you've made after hacking the kilosort labels to all be 'good'.

Is there any way for unitmatch to label these as noise itself? Or do you generally find that the kilosort 'good' label always excludes such noisy units?

EnnyvanBeest commented 7 hours ago

Using the Kilosort label is a good start, at least to throw out the noise. However Kilosort only decides on 'good' versus 'mua' based on the ISI distribution. We personally use bombcell but there are other quality metrics softwares as well. Manual curation with Phy is another option, but with a lot of data may become too labour intensive.

EnnyvanBeest / UnitMatch

python issue setting good_units_only = False #83