AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
340 stars 150 forks source link

add image novelty column to VBN session stimulus_presentations table #2515

Closed corbennett closed 1 year ago

corbennett commented 2 years ago

Describe the use case that is addressed by this feature. The VBN dataset has two image sets (G and H), each with 8 natural images. But two of these images are shared across both image sets. Therefore, whether a particular image shown during the active behavior (and passive replay) is novel to the mouse depends on

1) what image set they were trained on 2) what image set is being shown during the session 3) whether the image was shared across the image sets (these two images will ALWAYS be familiar to the mouse)

I think it would be very helpful to just have the SDK return a column indicating whether a particular image is novel for the given session. We don't need to regenerate the NWBs, just add some code to the SDK to make this column when the stimulus_presentations table is made.

Describe the solution you'd like Here's the breakdown of which images belong to which set (6 images are unique to H, 6 are unique to G and 2 are shared across them) Unique to H: im036_r im012_r im115_r im044_r im078_r im047_r

Unique to G: im104_r im114_r im005_r im087_r im024_r im034_r

Shared: im083_r im111_r

Ideally, we would make a new column called 'is_novel_image' in the stimulus_presentations table with the following behavior: True for images that are novel during the session in question False for images that are familiar NaN for omitted images or stimuli that weren't part of the behavior image sets.

A function like the following would produce this behavior:

def getImageNovelty(image_name, session_id, ecephys_sessions_table):
    '''
    Function to help annotate the stimulus_presentations table
    to indicate whether the image was novel to the mouse.      

    INPUT:
        image_name: str indicating which image to check for novelty (ie 'im024_r')
        session_id: the ecephys_session_id for session during which image was presented
        ecephys_sessions_table: the ecephys_sessions metadata table from the VBN cache

    OUTPUT:
        Returns one of the following:
        True: indicating that this image was novel for this session
        False: indicating that this image was familiar for this session
        np.nan: indicating that this stimulus wasn't one of the natural images (including omitted stimuli)

    '''
    is_novel_image_set = ecephys_sessions_table.loc[session_id]['experience_level'] == 'Novel'

    IMAGE_SET_KEY={
                'G' : ['im012_r', 'im036_r', 'im044_r', 
                    'im047_r', 'im078_r', 'im115_r'],
                'H' : ['im005_r', 'im024_r', 'im034_r', 
                    'im087_r', 'im104_r', 'im114_r'],
                'shared' : ['im083_r', 'im111_r'],
                'omitted' : 'omitted'
                }

    # First check that this image is one of the Natural Images used
    image_in_image_set = any([np.isin(image_name, imset) \
                              for _,imset in IMAGE_SET_KEY.items()]) 
    if not image_in_image_set:
        return np.nan

    #Get the image set for this image
    image_set_for_this_image = [name for name, image_set in IMAGE_SET_KEY.items()\
                                if image_name in image_set][0]

    #Get the image novelty for this image
    if image_set_for_this_image == 'omitted':
        novelty_for_this_image = np.nan
    else:
        novelty_for_this_image = is_novel_image_set and \
                            bool(np.isin(image_set_for_this_image, ['G', 'H']))

    return novelty_for_this_image

Describe alternatives you've considered We have provided the names of the images as above in the documentation, so users could compute this themselves. But since this is a pretty fundamental part of analyzing this data, it would be nice to provide it for them.

Do you want to work on this issue? Happy to review/test the implementation.

aamster commented 2 years ago

@corbennett when we load the stimulus presentations table for a given session, we don't have access to data from prior sessions without loading the nwb file for the prior session. In your prototype you used the project level metadata table, but this isn't available when loading an nwb file.

We could either: a) Load this data when generating the nwb file. Note that this would require regenerating all nwb files, which is risky b) point the students to this function which you wrote, which combines information from previous sessions with the current session

I considered hardcoding this, but that is rarely a good solution, and also I don't think it's possible since we don't know whether the image set is novel and also whether they were previously shown the images that are shared between image sets.

morriscb commented 1 year ago

Should already be in release 2.14.1

corbennett commented 1 year ago

@morriscb Should we go ahead and close this one? Looks like this column appears with 2.15.1.

morriscb commented 1 year ago

Done.