Closed magland closed 2 months ago
I see. Each make function is running on all entries in the nwb file https://github.com/LorenFrankLab/spyglass/blob/094e002aa4be29fe40c3847e31a169e98617a7ac/src/spyglass/common/common_ephys.py#L129
However, the key source of the table is specific to the electrode group, so the make function gets called once per each group in the single_transaction_make
function here.
https://github.com/LorenFrankLab/spyglass/blob/094e002aa4be29fe40c3847e31a169e98617a7ac/src/spyglass/common/populate_all_common.py#L75
I'm a little surprised that each call is taking so long. Do you know what part of the Electrode make function is using the time?
(disregard my deleted comment... investigating some more to answer your question)
I think it might be this
if elect_id in electrode_config_dicts:
# check whether the Probe.Electrode being referenced exists
query = Probe.Electrode & electrode_config_dicts[elect_id]
if len(query) == 0:
warnings.warn(
"No Probe.Electrode exists that matches the data: "
+ f"{electrode_config_dicts[elect_id]}. "
"The config YAML for Electrode with electrode_id "
+ f"{elect_id} will be ignored."
)
else:
key.update(electrode_config_dicts[elect_id])
because it's making a query to the database at each iteration. Would that make sense? Maybe this check is not needed?
actually it doesn't seem to be that. Investigating further....
Okay I'm pretty sure that the slow part is the BrainRegion.fetch_add(...)
in the following snippet.
key.update(
{
"electrode_id": elect_id,
"name": str(elect_id),
"electrode_group_name": elect_data.group_name,
"region_id": BrainRegion.fetch_add(
region_name=elect_data.group.location
),
"x": elect_data.get("x"),
"y": elect_data.get("y"),
"z": elect_data.get("z"),
"filtering": elect_data.get("filtering", "unfiltered"),
"impedance": elect_data.get("imp"),
**electrode_constants,
}
)
Can anything be done to avoid that?
It looks like fetch_add
is just making a query check of the database once it's inserted which shouldn't be slow.
We could try to reduce number of calls here, but in genereal spyglass is making many queries to the database. If the latency is high enough where this is slowing you down, you'll likely be aware of it a bunch of other spots too. It might be more productive to figure the source of the slowdown for your database queries
Makes, sense. Okay I'll investigate what our expected latency is.
I think we must be experiencing something like 200 milliseconds round-trip latency because that would be
0.2 sec * 128 * 32 = 14 minutes
But even at extremely good 30 msec round-trip latency this would be 2 minutes, whereas if it didn't do this 32x redundant then it would be 4 seconds.
So, given that it makes a network request at every iteration, do you think this could be brought down to 128 requests instead of 128 x 32?
Working on this solution #1125
I am finding that the most time-consuming part of importing a new session is the importing of all the electrodes. In my NWB there are 32 electrode groups, each with 4 electrodes, for a total of 128 electrodes. However what seems to be happening is that the 128 electrodes get inserted mayny times (maybe 32 times, once for each group?). I see
skip_duplicates=True
, so in the end only 128 electrodes are inserted. But this ends up taking > 15 minutes on my system, and it seems to be ~32 times the amount of work as it should be.Here's the code that seems to be getting executed once for every electrode group: https://github.com/LorenFrankLab/spyglass/blob/094e002aa4be29fe40c3847e31a169e98617a7ac/src/spyglass/common/common_ephys.py#L130-L202