Unexpected session_types in behavior_sessions table

matchings commented 3 years ago

Describe the bug the behavior sessions table from the BehaviorProjectCache(from_lims) includes cases where session_type is unexpected (needs to be renamed) or not defined (NaN).

In the above example, release_expts is the list of ophys experiments that are slated for release. The intention is to release all behavior only sessions for the mice that have ophys data in the release, which is why I filtered the behavior_sessions table accordingly.

There are two issues with the session_type values in the resulting list of behavior only sessions. First, 129 of them are NaNs. Second, some of them need to be renamed. Specifically,

'0_gratings_autorewards_15min', '1_gratings', '2_gratings_flashed', '3_images_a_10uL_reward', '4_images_a_handoff_lapsed', '4_images_a_handoff_ready', '4_images_a_training',

should map to:

'TRAINING_0_gratings_autorewards_15min', 'TRAINING_1_gratings', 'TRAINING_2_gratings_flashed', 'TRAINING_3_images_A_10uL_reward', 'TRAINING_4_images_A_handoff_lapsed', 'TRAINING_4_images_A_handoff_ready', 'TRAINING_4_images_A_training',

Let me know if you would like a list of behavior_session_ids that are NaNs or have one of the unexpected session_types

AllenSDK version 2.6.0

wbwakeman commented 3 years ago

@matchings Who are you volunteering to update copies of the affected .pkl files so that these can be corrected?

These are not like in LIMS where I can just do a database update to change them.

matchings commented 3 years ago

@wbwakeman lims does not know the session_type for behavior sessions? ok, well if we need to go all the way back to the pkl file, I suppose that @dougollerenshaw might be the best person to do this. I will discuss with him when he gets back from vacation.

wbwakeman commented 3 years ago

Correct. For Behavior-only sessions, it is pulled from the pkl file

matchings commented 3 years ago

dougollerenshaw commented 3 years ago

@wbwakeman can you assign this issue to me?

dougollerenshaw commented 3 years ago

@wbwakeman just to make sure I understand this correctly:

I need to iterate over ALL behavior-only sessions in the BehaviorProjectCache
For every session, I need to open the session object using BehaviorSession.from_lims and check the session.metadata['session_type'] key.
For any session that does not conform to the TRAINING_{N}_{DESCRIPTION_STRING} format, I need to:
- make a copy the existing PKL file in the filesystem
- open the existing PKL file
- change the session_type key inside the PKL file
- re-save the PKL file using the original filename

Is that correct?

wbwakeman commented 3 years ago

I don't know that I have internalized all of the information in this issue and comments, but I will add what I can. As described in a comment in #1772 we discovered that from_lims() queries the mtrain database for the stage, and the NWB creation will look inside the pkl files. This means that we need the pkl files to have the correct information, which I think is what Marina is trying to say with the creation of this issue.

So work for this issue would be to:

gather all sessions for release that do not have ophys data (I can help with that)
Discover if each has the expected stimulus information
For those that don't, create a copy of the file in a writable location with the corrected information
Inform me of the location of the updated files
I will move the files where they need to be.

dougollerenshaw commented 3 years ago

Thanks @wbwakeman. If you can give me step 1, I'll get going. If there's one thing I'm good at, it's writing extremely inefficient for-loops. This sounds like a good opportunity to apply that skillset.

wbwakeman commented 3 years ago

I'm not 100% confident in this list, but I think it is very, very close to correct and inclusive:

/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt

matchings commented 3 years ago

The way I have been getting this list is by getting the full list of behavior sessions from the SDK, then getting the list of mouse IDs in the ophys release list (which is far more vetted), then filter behavior sessions table by those mouse IDs. i will check in a bit whether that lines up with your list in that .txt file

dougollerenshaw commented 3 years ago

Ineffecient loop exhibit 1 (running now):

import pandas as pd
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession
from multiprocessing import Pool

behavior_session_list_path = '/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt'
with open(behavior_session_list_path, 'r') as f:
    behavior_session_list = [f.readline().strip('\n') for line in f]

def get_session_metadata(pkl_path):
    behavior_session_id = pkl_path.split('behavior_session_')[1].split('/')[0]
    session = BehaviorSession.from_lims(behavior_session_id)
    metadata = session.metadata
    metadata.update({
        'behavior_session_id':behavior_session_id,
        'pkl_path':pkl_path,
    })
    return metadata

with Pool(32) as pool:
    session_df = pd.DataFrame(pool.map(get_session_metadata, behavior_session_list))

dougollerenshaw commented 3 years ago

Here're the value_counts for the session_types in the list that @wbwakeman provided:

I'll rename all of the session types that don't start with "TRAINING" or "OPHYS"

Why is there one OPHYS_7 in there? That looks like a mistake.

wbwakeman commented 3 years ago

Are you able to provide the id for the one that was OPHYS_7 ?

dougollerenshaw commented 3 years ago

Here're the details on that one OPHYS_7 session. It was run on a behavior box. That makes no sense to me:

{'rig_name': 'BEH.G-Box5',
 'sex': 'M',
 'age': '21 wks',
 'stimulus_frame_rate': 60.0,
 'session_type': 'OPHYS_7_receptive_field_mapping',
 'experiment_datetime': Timestamp('2019-05-06 19:49:50.929000+0000', tz='UTC'),
 'reporter_line': ['Ai148(TIT2L-GC6f-ICL-tTA2)'],
 'driver_line': ['Vip-IRES-Cre'],
 'LabTracks_ID': 449653.0,
 'full_genotype': 'Vip-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt',
 'behavior_session_uuid': UUID('42f33366-96aa-43a8-83dc-c45ed5ce7baf'),
 'foraging_id': '42f33366-96aa-43a8-83dc-c45ed5ce7baf',
 'behavior_session_id': '863571054',
 'pkl_path': '/allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_863571054/190506124944_449653_42f33366-96aa-43a8-83dc-c45ed5ce7baf.pkl'}

dougollerenshaw commented 3 years ago

@wbwakeman I'm confused about this list you gave me. It contains ophys sessions (as shown above), but when I filter on a given mouse, it doesn't necessarily contain ALL ophys sessions. Here's an example for labtracks_id == 484627. Note that the only ophys session is OPHYS_0_images_A_habituation.

Are we releasing data for mice with incomplete ophys data? Or is the list you provided incomplete?

dougollerenshaw commented 3 years ago

Also, here's the progression of sessions that I see for the mouse that has the ophys_7 stage. Looks like it was interleaved amongst other training sessions before the ophys handoff. Must have been an operator mistake somehow? Not sure how that could have happened since I thought that mtrain was making all of the stage progression decisions:

I'm assuming that we can't build an NWB file out of that session since the format is different. But a user might wish to know that the animal was run on a rig that day and that it apparently saw incorrect stimuli (receptive field mapping movies: that must have been a surprise!)

dougollerenshaw commented 3 years ago

Ugh, more confusion. Looking closer at the above list:

there is no TRAINING_2 session
there's a gap in training between 4/25/19 and 4/30/19.

That makes me think that the provided list of behavior-only sessions is incomplete.

wbwakeman commented 3 years ago

Could be. I would like to spend more time on it. Will try to when I can

wbwakeman commented 3 years ago

mtrain confirms the behavior session 863571054 with stimulus OPHYS_7 for mouse 449653 http://mtrain/behavior_session/42f33366-96aa-43a8-83dc-c45ed5ce7baf/details

(which is not to say that a user did not make a mistake - I believe they always have/had the ability to override what mtrain said was the next stimulus to run)

wbwakeman commented 3 years ago

So, I specifically excluded all sessions that did have releasable ophys. We are not expecting to release "behavior-only" NWB files for those. Maybe something got funky in my query while attempting to do that. Here is a list where I don't attempt to filter those out:

/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt

I also don't understand how you are getting the lists that you are displaying, like the incomplete list for mouse 484627. I think all sessions are in the file.

wbwakeman commented 3 years ago

For 449653, there were behavior_sessions on 4/25, 4/26, 4/29, 4/30 (focusing in on the missing area you identified). These are listed in the first file that I provided (respectively):

/allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_857555456/190425133618_449653_a6104dd6-dc1d-4726-8fc5-6c1b1950bcf5.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_858240891/190426124905_449653_8f3d07a7-c69e-4ff9-b932-68f3f193ef1f.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_859032026/190429122746_449653_138fcb1b-c7ec-48c9-96a0-5d1f29f8ba04.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_859791558/190430133729_449653_3469e7b2-cb53-4d04-9ef2-376790df14a1.pkl

matchings commented 3 years ago

@wbwakeman if we don't release behavior only NWB files for ophys sessions, does that mean that the behavior data from ophys sessions won't be loadable via BehaviorSessions? I was under the impression that anything with a behavior_session_id should be loadable with BehaviorSessions.

As it is now (using LIMS API), its possible to load both behavior only and behavior ophys sessions via BehaviorSessions, it just wont have any ophys data for the sessions with ophys. I think that is desirable for users who are only analyzing behavior, and want to include behavior during ophys sessions (but dont care about the ophys data), so that they only need to use one access point.

I guess another way to frame this question is - does the NWB API for BehaviorSession and BehaviorOphysSession require that the NWB file is exactly matched to the methods in the API (no more, no less), or can the API load parts of an NWB file (i.e. only the behavior parts)?

if this is too confusing how ive described it, lets try for a teams call about it

wbwakeman commented 3 years ago

@matchings Yes, we need to work this out. Your request is clear. We just need to do it

matchings commented 3 years ago

@wbwakeman ok thank you, sorry if I am repeating myself, I am forgetting what issues ive brought up already or have just kept in my head.

matchings commented 3 years ago

@dougollerenshaw not sure if this is a useful hint or not, but i just came across a behavior session with a session_type = NaN and an attempt to load the metadata for that session gave the below error. This is for AllenSDK 2.7.0. Looks like the pkl file does not conform to the expected structure? Maybe it is a foraging session? It is from the list of behavior sessions belonging to mice going in the data release though.

EDIT: This behavior_session_id(823998659) corresponds to an OPHYS_7_receptive_field_mapping_session, so it is not surprising that the pkl file doesnt conform. @wbwakeman I believe we need to make sure that the BehaviorProjectCache get_behavior_session_table method does not return any receptive field mapping sessions, or else users (like me) will try to load them with BehaviorSessionsand get this error. They simply shouldn't have a behavior_session_idif you ask me, but the easiest thing is probably to just filter them out of the list.

matchings commented 3 years ago

A different behavior session with a NaN session type gives this different error (note that this is a behavior only session and the previous case was behavior from a 2P5 experiment):

EDIT: There are a lot of behavior only sessions with NaN as session type. I just checked the behavior session report for one of them on mouse-seeks (http://mouse-seeks/qc/behavior/950119814), and it does indeed appear to be a regular change detection session (not foraging or some other mysterious thing), so I am guessing that the issue is that the stage_name is missing in mTrain. hopefully it is in the pkl file.

dougollerenshaw commented 3 years ago

@wbwakeman in the text block I shared above to read in your text file, something was going wrong that was leading to some rows getting missed. That explains the missing sessions. I ended up using the following block of code from @matchings to get 3674 behavior session IDs:

import visual_behavior.data_access.loading as loading
from allensdk.brain_observatory.behavior.behavior_project_cache import BehaviorProjectCache
# get list of experiments in the release
release_expts = loading.get_filtered_ophys_experiment_table(release_data_only=True)
# get list of all behavior sessions
cache = BehaviorProjectCache.from_lims(manifest=loading.get_manifest_path())
behavior_sessions = cache.get_behavior_session_table()
# only mice that have ophys experiments in the release
behavior_sessions['mouse_id'] = [int(mouse_id) for mouse_id in behavior_sessions.mouse_id.values]
behavior_sessions = behavior_sessions[behavior_sessions.mouse_id.isin(release_expts.mouse_id.unique())]

For all sessions with non-conforming session names, I'm saving new PKL files with conforming names here: /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files

wbwakeman commented 3 years ago

Thank you! These have been copied out to their production locations (and the originals renamed to ".bak".

wbwakeman commented 3 years ago

Hi @dougollerenshaw 64 of the behavior sessions still have an unexpected session name. The one that I dived in to was in the list and did get an updated pkl file in your directory. The updated file did get copied out to the production location and the name matched the previous file. Not sure if there was a problem with the pkl update for these 64?

A file is at /allen/aibs/technology/waynew/behavior/behavior_only_nwb/20210220_behavior_only_unexpected_session_name.txt that has the behavior_session id and the 'offending' stimulus name as well as a list of the expected session names.

matchings commented 3 years ago

@wbwakeman could you share that information as a .csv instead of .txt? Might help to avoid similar issues as Doug had last time reading the other text file.

wbwakeman commented 3 years ago

Yes, the csv version is in the same directory now: /allen/aibs/technology/waynew/behavior/behavior_only_nwb/20210220_behavior_only_unexpected_session_name.csv

The regex pattern templates are: 'OPHYS_0images', 'OPHYS[1|3]_images', 'OPHYS_2images', 'OPHYS[4|6]_images', 'OPHYS_5_images', 'TRAINING_0_gratings', 'TRAINING_1_gratings', 'TRAINING_2_gratings', 'TRAINING_3_images', 'TRAINING_4_images', 'TRAINING_5_images'

wbwakeman commented 3 years ago

I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.

wbwakeman commented 3 years ago

And one more set that is causing troubles. A different error but it appears to track back a missing stimulus name

 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl

dougollerenshaw commented 3 years ago

On this now. Will update shortly.

dougollerenshaw commented 3 years ago

@wbwakeman I made a bit of a mess of this on my first pass. In addition to missing some files (as you noted above), I also misnamed some sessions in the new file.

So I tried to redo a bit more systematically.

Everything is now in the following directory: /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21

Here's what you'll find:

session_summary.csv: a CSV file containing a summary of all 3744 files that I examined in this process (using a combination of Marina's code above and your lists)
a new PKL file for every session in that list for which the stage name did not conform to the "OPHYS{N}{DESC}" or "TRAINING{N}{DESC}" pattern. These are denoted as 'resave_pkl == True' in the above CSV. There are 182 of these
sessions for which I could not extract the PKL stage name are excluded. These don't conform to the foraging2 file format and are almost certainly OPHYS_7 sessions. There are 102 of these.

For any PKL files that are in this directory and for which you already copied over a new file, please use the file in this location to replace the previous re-written PKL.

dougollerenshaw commented 3 years ago

Showing my work: https://gist.github.com/dougollerenshaw/ff6be6499bc56b86c33375a5bfcc8b50

wbwakeman commented 3 years ago

Cool. Thanks @dougollerenshaw . Only one question. It looks like there are only 112 pkl files in the /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21 directory. From your comment, and from the spreadsheet, should I not expect 182?

Oh, I got it. There are duplicates. e.g.

180827141745_403491_d744c587-0130-45cb-92d5-398ae2b6fab8.pkl
180827141745_403491_d744c587-0130-45cb-92d5-398ae2b6fab8.pkl

dougollerenshaw commented 3 years ago

@wbwakeman good catch. The duplicates must have come from reading in sessions both from Marina's code block and from your lists. I didn't think to check for duplicates. But after dropping duplicates, I was able to confirm that there indeed should have been only 112 PKL files in that directory.

wbwakeman commented 3 years ago

Hi @dougollerenshaw I'm still getting an error with a lot of these. I tried to extract the value manually for one just to check it and got a unsupported pickle protocol error. Here is what I did:

>>> import cPickle as pickle
>>> i = '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl'
>>> file = open(i)
>>> data = pickle.load(file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported pickle protocol: 4

Are you able to extract a stage name from that file? (Any recommendation on a better snippet to check these?)

FWIW, this is the new file that was copied out yesterday. The old file is the .bak. I CAN extract a stage name from the .bak file ('0_gratings_autorewards_15min')

-rwxrwxr-x 1 nobody 301 2439574 Feb 21 15:12 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl
-rwxrwxrwx 1 nobody 301 6018645 Aug 24  2018 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl.bak

dougollerenshaw commented 3 years ago

Sorry @wbwakeman! One more reason that we shouldn't be relying on PKL files for data storage! I saved with the pandas v1.2.2 to_pickle method. This works for me for unpickling:

import pandas as pd
fn = '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl'

data = pd.read_pickle(fn)
data['items']['behavior']['params']['stage']

'TRAINING_0_gratings_autorewards_15min'

I don't have cPickle installed, but when I try loading with pickle, I also get an error:

import pickle
file = open(fn)
data = pickle.load(file)

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-1db6b2b3868b> in <module>
      1 import pickle
      2 file = open(fn)
----> 3 data = pickle.load(file)

~/.conda/envs/vba/lib/python3.7/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Is it possible for you to load these with Pandas in pipeline? If not, should I try opening in Pandas and resaving with Pickle?

After this, let's never use PKL format again!!!!!!

dougollerenshaw commented 3 years ago

@azcolin and @rhytnen: see above. These problems should be kept in mind as future effort is put into stimulus control software.

wbwakeman commented 3 years ago

Thanks @dougollerenshaw . I need to track down something on our side. No need to resave those files.

I am interested in your opinion on these two comments: https://github.com/AllenInstitute/AllenSDK/issues/1903#issuecomment-782882964 https://github.com/AllenInstitute/AllenSDK/issues/1903#issuecomment-782883386

dougollerenshaw commented 3 years ago

@wbwakeman:

I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.

Since it's both a 4x2 session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.

fn = '/allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl'
data = pd.read_pickle(fn)
data['items']['foraging']['cl_params']['user_id']

'sams'

@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?

wbwakeman commented 3 years ago

Digging into one of the jobs that is still failing, I see this error:

'0_gratings_autorewards_15min' matched 0 pattern templates.the regex pattern templates are ['\\ATRAINING_0_gratings', .....

Which leads me to see that:

>>> data['items']['behavior']['cl_params']['stage']
'0_gratings_autorewards_15min'

>>> data['items']['behavior']['params']['stage']
OPHYS_0_grating_autorewards_15_min

For file '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl' I believe this is affecting a set of about 64 experiments, can provide that list if needed

samiamseid commented 3 years ago

@wbwakeman:

I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.

Since it's both a 4x2 session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.

@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?

We were testing WSE updates. Targeted_X and Targeted_Y variables were not getting set correctly in the platform_json on this session, which was part of a miscommunication about how those variables needed to be set. It was subsequently resolved. This is a test session that was never successfully uploaded due to the WSE errors.

I can also tell this is an Receptive Field Mapping stimulus session. All pkl files for Receptive Field Mapping sessions are different than the rest, since its a completely different type of stimulus than the other visual behavior scripts. Is it possible this looks like a different pkl structure because youre used to looking at the normal behavior sessions and not the RF mapping sessions?

dougollerenshaw commented 3 years ago

@wbwakeman:

I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.

Since it's both a 4x2 session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.

@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?

We were testing WSE updates. Targeted_X and Targeted_Y variables were not getting set correctly in the platform_json on this session, which was part of a miscommunication about how those variables needed to be set. It was subsequently resolved. This is a test session that was never successfully uploaded due to the WSE errors.

I can also tell this is an Receptive Field Mapping stimulus session. All pkl files for Receptive Field Mapping sessions are different than the rest, since its a completely different type of stimulus than the other visual behavior scripts. Is it possible this looks like a different pkl structure because youre used to looking at the normal behavior sessions and not the RF mapping sessions?

Thanks Sam! Actually, yes, this does look a standard ophys_7 session. I think I may have been looking at a deeper level in the dict earlier when I said it looked unfamiliar. Sorry for the false-alarm. Regardless, @wbwakeman, we should ignore this file for release.

dougollerenshaw commented 3 years ago

@wbwakeman, for the 6 files you linked above:

And one more set that is causing troubles. A different error but it appears to track back a missing stimulus name

 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl

They are included in the re-saved session PKL files I linked yesterday. All 6 are behavior/ophys sessions. I ran this code block to confirm:

import os
import pandas as pd

pkl_list = [ 
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl',
]
new_pkl_path = '/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21'

for ii, pkl_path in enumerate(pkl_list):
    fn = os.path.split(pkl_path)[1]
    data = pd.read_pickle(os.path.join(new_pkl_path, fn))
    s1 = data['items']['behavior']['params']['stage']
    s2 = data['items']['behavior']['cl_params']['stage']
    print("item {} in list\n\tfn = {}\n\tdata['items']['behavior']['params']['stage'] = {}\n\tdata['items']['behavior']['cl_params']['stage'] = {}".format(ii, fn, s1, s2))

Which gave me this:

item 0 in list
    fn = 773079706.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_0_images_A_habituation
    data['items']['behavior']['cl_params']['stage'] = OPHYS_0_images_A_habituation
item 1 in list
    fn = 773939589.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_0_images_A_habituation
    data['items']['behavior']['cl_params']['stage'] = OPHYS_0_images_A_habituation
item 2 in list
    fn = 774747386.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_1_images_A
    data['items']['behavior']['cl_params']['stage'] = OPHYS_1_images_A
item 3 in list
    fn = 181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_4_images_B
    data['items']['behavior']['cl_params']['stage'] = OPHYS_4_images_B
item 4 in list
    fn = 181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_6_images_B
    data['items']['behavior']['cl_params']['stage'] = OPHYS_6_images_B
item 5 in list
    fn = 791539630.pkl
    data['items']['behavior']['params']['stage'] = OPHYS_4_images_B
    data['items']['behavior']['cl_params']['stage'] = OPHYS_4_images_B

dougollerenshaw commented 3 years ago

Digging into one of the jobs that is still failing, I see this error:
'0_gratings_autorewards_15min' matched 0 pattern templates.the regex pattern templates are ['\\ATRAINING_0_gratings', .....
Which leads me to see that:
>>> data['items']['behavior']['cl_params']['stage']
'0_gratings_autorewards_15min'

>>> data['items']['behavior']['params']['stage']
OPHYS_0_grating_autorewards_15_min
For file '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl' I believe this is affecting a set of about 64 experiments, can provide that list if needed

@wbwakeman I iterated through all PKL files in that directory and changed the data['items']['behavior']['cl_params']['stage'] key for every file in which it didn't match the data['items']['behavior']['params']['stage'] key using the following:

saveloc = '/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21'
pkls = [fn for fn in os.listdir(saveloc) if fn.endswith('pkl')]
for ii, pkl in enumerate(pkls):
    data = pd.read_pickle(os.path.join(saveloc, pkl))
    s1 = data['items']['behavior']['params']['stage']
    s2 = data['items']['behavior']['cl_params']['stage']
    if s1 != s2:
        print("saving new PKL for item {} in list\n\tfn = {}\n\tdata['items']['behavior']['params']['stage'] = {}\n\tdata['items']['behavior']['cl_params']['stage'] = {}".format(ii, pkl, s1, s2))
        data['items']['behavior']['cl_params']['stage'] = s1
        pd.to_pickle(data, os.path.join(saveloc, pkl))

They should all be good to go now. Let me know if there are any more issues!

wbwakeman commented 3 years ago

Thanks @dougollerenshaw I will check those out.

For the set of six, I believe the issue is that they do not have 'ophys_experiment' records in LIMS. So processing was interrupted in such a way that the data never even made it to LIMS. Makes me wonder if there was any behavior worth recording for those. Are you able to tell from the pkl file whether there was any stimulus and response? If so, I'll figure out how to save this, but if they are just junk, then I'll figure out how to exclude them

    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl',
    '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl',

dougollerenshaw commented 3 years ago

@wbwakeman for these 6, four have a substantial number of licks, so I'd assume they are reasonable sessions. One has only 1 lick and the other 0, so those two could have been failed for behavior reasons.

from visual_behavior.translator.foraging2 import data_to_change_detection_core

for pkl_file in pkl_list:
    data = pd.read_pickle(pkl_file)
    core_data = data_to_change_detection_core(data)
    print(pkl_file)
    print('stage: {}'.format(core_data['metadata']['stage']))
    print('number of licks: {}'.format(len(core_data['licks'])))
    print('')

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl
stage: OPHYS_0_images_A_habituation
number of licks: 3450

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl
stage: OPHYS_0_images_A_habituation
number of licks: 4757

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl
stage: OPHYS_1_images_A
number of licks: 5969

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
stage: OPHYS_4_images_B
number of licks: 1

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
stage: OPHYS_6_images_B
number of licks: 0

/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl
stage: OPHYS_4_images_B
number of licks: 3683

AllenInstitute / AllenSDK

Unexpected session_types in behavior_sessions table #1903