LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
90 stars 41 forks source link

PositionIntervalMap population fails when called within populate transaction #626

Closed dpeg22 closed 1 year ago

dpeg22 commented 1 year ago

Describe the bug When populating spyglass.position.v1.DLCPoseEstimation there is an error thrown here, which in turn tries to populate PositionIntervalMap within the DLCPoseEstimation population transaction. This breaks, causing the bug.

To Reproduce Steps to reproduce the behavior:

  1. This error is on file position_dlc_pose_estimation.py#L240 at file path /spyglass/src/spyglass/position/v1
  2. Populate DLCPoseEstimation with an NWB file that does not have an entry in PositionIntervalMap
  3. see error
Error ``` python Cell In[2], line 3, in run_pose_estimation() 1 @dask.delayed 2 def run_pose_estimation(key, epoch): ----> 3 return sgp.DLCPoseEstimation().populate({**key, "epoch": epoch}) File ~/Src2/datajoint-python/datajoint/autopopulate.py:241, in populate() 237 if processes == 1: 238 for key in ( 239 tqdm(keys, desc=self.__class__.__name__) if display_progress else keys 240 ): --> 241 error = self._populate1(key, jobs, **populate_kwargs) 242 if error is not None: 243 error_list.append(error) File ~/Src2/datajoint-python/datajoint/autopopulate.py:292, in _populate1() 290 self.__class__._allow_insert = True 291 try: --> 292 make(dict(key), **(make_kwargs or {})) 293 except (KeyboardInterrupt, SystemExit, Exception) as error: 294 try: File ~/Src2/spyglass/src/spyglass/position/v1/position_dlc_pose_estimation.py:240, in make() 234 creation_time = datetime.fromtimestamp( 235 dlc_result.creation_time 236 ).strftime("%Y-%m-%d %H:%M:%S") 238 logger.logger.info("getting raw position") 239 interval_list_name = ( --> 240 convert_epoch_interval_name_to_position_interval_name( 241 { 242 "nwb_file_name": key["nwb_file_name"], 243 "epoch": key["epoch"], 244 } 245 ) 246 ) 247 spatial_series = ( 248 RawPosition() 249 & {**key, "interval_list_name": interval_list_name} 250 ).fetch_nwb()[0]["raw_position"] 251 _, _, _, video_time = get_video_path(key) File ~/Src2/spyglass/src/spyglass/common/common_behav.py:419, in convert_epoch_interval_name_to_position_interval_name() 415 pos_interval_names = (PositionIntervalMap & key).fetch( 416 "position_interval_name" 417 ) 418 if len(pos_interval_names) == 0: --> 419 PositionIntervalMap.populate(key) 420 pos_interval_names = (PositionIntervalMap & key).fetch( 421 "position_interval_name" 422 ) 423 if len(pos_interval_names) == 0: File ~/Src2/datajoint-python/datajoint/autopopulate.py:185, in populate() 165 """ 166 ``table.populate()`` calls ``table.make(key)`` for every primary key in 167 ``self.key_source`` for which there is not already a tuple in table. (...) 182 :type make_kwargs: dict, optional 183 """ 184 if self.connection.in_transaction: --> 185 raise DataJointError("Populate cannot be called during a transaction.") 187 valid_order = ["original", "reverse", "random"] 188 if order not in valid_order: DataJointError: Populate cannot be called during a transaction. ```

Expected behavior PositionIntervalMap population occurs outside of the transaction

samuelbray32 commented 1 year ago

Sorry, I didn't realize this restriction on populating during a populate call in datajoint. I don't think can overcome the datajoint level of things, so need to ensure it's populated before this point. The options I see are:

Let me know if you have a thoughts/preference and I'll implement

dpeg22 commented 1 year ago

I think the last option to populate during insertion into spyglass is probably the best. I also think adding an exception to everywhere that calls the table if it isn't populated is a worthy endeavor.

Let me know if you want me to help :-) Forgot I'm leaving...

edeno commented 1 year ago

Agree with populating the table upon insertion.

CBroz1 commented 1 year ago

Sorry I'm late to take a look at this. @samuelbray32 - your approach uses a flag to avoid and embedded populate, throwing an err and asking the user to do the populating first. Is that right?

Maybe the following would give us a route to the populate without embedded transaction, because make is the protected func name that auto-encapsulates in a transaction

class PositionMapInterval:
    ...
    def make(self, key):
        _no_transaction_make(self,key)
    def _no_transaction_make(self,key):
        if not self.connection.in_transaction:
            raise SomeError('This method should only be run via another `make` method')
        ... # current make func, only to be used in another populate

def convert_epoch_interval_name_to_position_interval_name(key):
    if len(pos_interval_names) == 0:
        PositionIntervalMap._no_transaction_make(key)
samuelbray32 commented 1 year ago

@CBroz1 correct, the flag is there to try and give the user a more specific error if embedded population was happening. Does this solution just provide a better way to raise that error, or does it avoid that Datajoint problem entirely?

Also for context, I added code to populate the table during spyglass insertion, so the issue should be practically fixed as well. If your solution covers it better though that would be great. Thanks!

CBroz1 commented 1 year ago

I think my solve would avoid embedded population entirely, by only calling make when populating the table itself, and doing the same thing without the transaction in other cases

samuelbray32 commented 1 year ago

@CBroz1 could you look at this commit and let me know if you think it violates datajoints rules? I tried to set it up so it could insert into the PositionIntervalMap table both within or outside of another populate call. It works for my test cases, but I may be missing something or doing something unsafe.