Make synthetic data even more realistic

With respect to these lines

# Define NeuroPixel-like values for sampling rates and conversion factors
    duration_in_s = 3.0
    number_of_units = 50
    number_of_channels = 385  # Have to include 'sync' channel to be proper SpikeGLX. TODO: artificiate sync pulses
    ap_conversion_factor_to_uV = 2.34375
    ap_sampling_frequency = 30_000.0
    lf_sampling_frequency = 2_500.0
    downsample_factor = int(ap_sampling_frequency / lf_sampling_frequency)

    # Generate synthetic spiking and voltage traces with waveforms around them
    artificial_ap_band, spiking = spikeinterface.generate_ground_truth_recording(
        durations=[duration_in_s],
        sampling_frequency=ap_sampling_frequency,
        num_channels=number_of_channels,
        dtype="float32",
        num_units=number_of_units,
        seed=0,  # Fixed seed for reproducibility
    )
    artificial_ap_band.set_channel_gains(gains=ap_conversion_factor_to_uV)
    waveform_extractor = spikeinterface.extract_waveforms(recording=artificial_ap_band, sorting=spiking, mode="memory")
    int16_artificial_ap_band = artificial_ap_band.astype(dtype="int16")

    # Approximate behavior of LF band with filter and downsampling
    # TODO: currently looks a little out of scale?
    artificial_lf_filter = spikeinterface.preprocessing.bandpass_filter(
        recording=artificial_ap_band, freq_min=10, freq_max=300
    )
    int16_artificial_lf_band = NumpyRecording(
        traces_list=artificial_lf_filter.get_traces()[::downsample_factor],
        sampling_frequency=lf_sampling_frequency,
    )

Moving discussion from Slack and looking for advice from @alejoe91

We're trying to approximate synthetic AP/LF bands and their associated spiking activity + waveforms (which get written as Phy)

(i) what filter should we use (using bandpass ATM; if so, what frequency range would you recommend) to approximate LF - also, how would you recommend downsampling?

(ii) for doing recording.astype("int16"), do you think I should do that before or after extract_waveforms?

_Originally posted by @CodyCBakerPhD in https://github.com/NeurodataWithoutBorders/nwb-guide/pull/530#discussion_r1477102364_

Almost there!

The generate_* function generates data in uV, so if you set additional gains you're applying the conversion twice. In order to match the range and scaling of NP1.0 you should do the following:

artificial_ap_band = artificial_ap_band.scale(gain=1/ap_conversion_factor_to_uV)
int16_artificial_ap_band = artificial_ap_band.astype(dtype="int16")
int16_artificial_ap_band.set_channel_gains(ap_conversion_factor_to_uV)

For generating the LF stream, you can use the resample function. Note that NP has a hardware filter at 1000Hz for LF and its sampled at 2.5KHz:

artificial_lf_filter = spikeinterface.preprocessing.bandpass_filter(
      recording=artificial_ap_band, freq_min=0.5, freq_max=1000
 )
artificial_lf_band = spikeinterface.preprocessing.resample(
      recording=artificial_lf_band, resample_rate=2500
)
int16_artificial_lf_band = artificial_lf_band.astype(dtype="int16")
int16_artificial_lf_band.set_channel_gains(ap_conversion_factor_to_uV)

Note that the simulated reording will not have low frequency components.

NeurodataWithoutBorders / nwb-guide

Make synthetic data even more realistic #583