Closed matchings closed 3 years ago
@matchings Who are you volunteering to update copies of the affected .pkl files so that these can be corrected?
These are not like in LIMS where I can just do a database update to change them.
@wbwakeman lims does not know the session_type for behavior sessions? ok, well if we need to go all the way back to the pkl file, I suppose that @dougollerenshaw might be the best person to do this. I will discuss with him when he gets back from vacation.
Correct. For Behavior-only sessions, it is pulled from the pkl file
@wbwakeman can you assign this issue to me?
@wbwakeman just to make sure I understand this correctly:
BehaviorSession.from_lims
and check the session.metadata['session_type']
key.TRAINING_{N}_{DESCRIPTION_STRING}
format, I need to:
session_type
key inside the PKL fileIs that correct?
I don't know that I have internalized all of the information in this issue and comments, but I will add what I can. As described in a comment in #1772 we discovered that from_lims() queries the mtrain database for the stage, and the NWB creation will look inside the pkl files. This means that we need the pkl files to have the correct information, which I think is what Marina is trying to say with the creation of this issue.
So work for this issue would be to:
Thanks @wbwakeman. If you can give me step 1, I'll get going. If there's one thing I'm good at, it's writing extremely inefficient for-loops. This sounds like a good opportunity to apply that skillset.
I'm not 100% confident in this list, but I think it is very, very close to correct and inclusive:
/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt
The way I have been getting this list is by getting the full list of behavior sessions from the SDK, then getting the list of mouse IDs in the ophys release list (which is far more vetted), then filter behavior sessions table by those mouse IDs. i will check in a bit whether that lines up with your list in that .txt file
Ineffecient loop exhibit 1 (running now):
import pandas as pd
from allensdk.brain_observatory.behavior.behavior_session import BehaviorSession
from multiprocessing import Pool
behavior_session_list_path = '/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt'
with open(behavior_session_list_path, 'r') as f:
behavior_session_list = [f.readline().strip('\n') for line in f]
def get_session_metadata(pkl_path):
behavior_session_id = pkl_path.split('behavior_session_')[1].split('/')[0]
session = BehaviorSession.from_lims(behavior_session_id)
metadata = session.metadata
metadata.update({
'behavior_session_id':behavior_session_id,
'pkl_path':pkl_path,
})
return metadata
with Pool(32) as pool:
session_df = pd.DataFrame(pool.map(get_session_metadata, behavior_session_list))
Here're the value_counts for the session_types in the list that @wbwakeman provided:
I'll rename all of the session types that don't start with "TRAINING" or "OPHYS"
Why is there one OPHYS_7 in there? That looks like a mistake.
Are you able to provide the id for the one that was OPHYS_7 ?
Here're the details on that one OPHYS_7 session. It was run on a behavior box. That makes no sense to me:
{'rig_name': 'BEH.G-Box5',
'sex': 'M',
'age': '21 wks',
'stimulus_frame_rate': 60.0,
'session_type': 'OPHYS_7_receptive_field_mapping',
'experiment_datetime': Timestamp('2019-05-06 19:49:50.929000+0000', tz='UTC'),
'reporter_line': ['Ai148(TIT2L-GC6f-ICL-tTA2)'],
'driver_line': ['Vip-IRES-Cre'],
'LabTracks_ID': 449653.0,
'full_genotype': 'Vip-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt',
'behavior_session_uuid': UUID('42f33366-96aa-43a8-83dc-c45ed5ce7baf'),
'foraging_id': '42f33366-96aa-43a8-83dc-c45ed5ce7baf',
'behavior_session_id': '863571054',
'pkl_path': '/allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_863571054/190506124944_449653_42f33366-96aa-43a8-83dc-c45ed5ce7baf.pkl'}
@wbwakeman I'm confused about this list you gave me. It contains ophys sessions (as shown above), but when I filter on a given mouse, it doesn't necessarily contain ALL ophys sessions. Here's an example for labtracks_id == 484627. Note that the only ophys session is OPHYS_0_images_A_habituation
.
Are we releasing data for mice with incomplete ophys data? Or is the list you provided incomplete?
Also, here's the progression of sessions that I see for the mouse that has the ophys_7 stage. Looks like it was interleaved amongst other training sessions before the ophys handoff. Must have been an operator mistake somehow? Not sure how that could have happened since I thought that mtrain was making all of the stage progression decisions:
I'm assuming that we can't build an NWB file out of that session since the format is different. But a user might wish to know that the animal was run on a rig that day and that it apparently saw incorrect stimuli (receptive field mapping movies: that must have been a surprise!)
Ugh, more confusion. Looking closer at the above list:
That makes me think that the provided list of behavior-only sessions is incomplete.
Could be. I would like to spend more time on it. Will try to when I can
mtrain confirms the behavior session 863571054 with stimulus OPHYS_7 for mouse 449653 http://mtrain/behavior_session/42f33366-96aa-43a8-83dc-c45ed5ce7baf/details
(which is not to say that a user did not make a mistake - I believe they always have/had the ability to override what mtrain said was the next stimulus to run)
So, I specifically excluded all sessions that did have releasable ophys. We are not expecting to release "behavior-only" NWB files for those. Maybe something got funky in my query while attempting to do that. Here is a list where I don't attempt to filter those out:
/allen/aibs/technology/waynew/ophys/pkl_session_names/20210219_release_behavior_pkl_files.txt
I also don't understand how you are getting the lists that you are displaying, like the incomplete list for mouse 484627. I think all sessions are in the file.
For 449653, there were behavior_sessions on 4/25, 4/26, 4/29, 4/30 (focusing in on the missing area you identified). These are listed in the first file that I provided (respectively):
/allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_857555456/190425133618_449653_a6104dd6-dc1d-4726-8fc5-6c1b1950bcf5.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_858240891/190426124905_449653_8f3d07a7-c69e-4ff9-b932-68f3f193ef1f.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_859032026/190429122746_449653_138fcb1b-c7ec-48c9-96a0-5d1f29f8ba04.pkl /allen/programs/braintv/production/neuralcoding/prod0/specimen_837628436/behavior_session_859791558/190430133729_449653_3469e7b2-cb53-4d04-9ef2-376790df14a1.pkl
@wbwakeman if we don't release behavior only NWB files for ophys sessions, does that mean that the behavior data from ophys sessions won't be loadable via BehaviorSessions? I was under the impression that anything with a behavior_session_id should be loadable with BehaviorSessions.
As it is now (using LIMS API), its possible to load both behavior only and behavior ophys sessions via BehaviorSessions, it just wont have any ophys data for the sessions with ophys. I think that is desirable for users who are only analyzing behavior, and want to include behavior during ophys sessions (but dont care about the ophys data), so that they only need to use one access point.
I guess another way to frame this question is - does the NWB API for BehaviorSession and BehaviorOphysSession require that the NWB file is exactly matched to the methods in the API (no more, no less), or can the API load parts of an NWB file (i.e. only the behavior parts)?
if this is too confusing how ive described it, lets try for a teams call about it
@matchings Yes, we need to work this out. Your request is clear. We just need to do it
@wbwakeman ok thank you, sorry if I am repeating myself, I am forgetting what issues ive brought up already or have just kept in my head.
@dougollerenshaw not sure if this is a useful hint or not, but i just came across a behavior session with a session_type = NaN and an attempt to load the metadata for that session gave the below error. This is for AllenSDK 2.7.0. Looks like the pkl file does not conform to the expected structure? Maybe it is a foraging session? It is from the list of behavior sessions belonging to mice going in the data release though.
EDIT: This behavior_session_id
(823998659) corresponds to an OPHYS_7_receptive_field_mapping_session, so it is not surprising that the pkl file doesnt conform. @wbwakeman I believe we need to make sure that the BehaviorProjectCache
get_behavior_session_table
method does not return any receptive field mapping sessions, or else users (like me) will try to load them with BehaviorSessions
and get this error. They simply shouldn't have a behavior_session_id
if you ask me, but the easiest thing is probably to just filter them out of the list.
A different behavior session with a NaN session type gives this different error (note that this is a behavior only session and the previous case was behavior from a 2P5 experiment):
EDIT: There are a lot of behavior only sessions with NaN as session type. I just checked the behavior session report for one of them on mouse-seeks (http://mouse-seeks/qc/behavior/950119814), and it does indeed appear to be a regular change detection session (not foraging or some other mysterious thing), so I am guessing that the issue is that the stage_name is missing in mTrain. hopefully it is in the pkl file.
@wbwakeman in the text block I shared above to read in your text file, something was going wrong that was leading to some rows getting missed. That explains the missing sessions. I ended up using the following block of code from @matchings to get 3674 behavior session IDs:
import visual_behavior.data_access.loading as loading
from allensdk.brain_observatory.behavior.behavior_project_cache import BehaviorProjectCache
# get list of experiments in the release
release_expts = loading.get_filtered_ophys_experiment_table(release_data_only=True)
# get list of all behavior sessions
cache = BehaviorProjectCache.from_lims(manifest=loading.get_manifest_path())
behavior_sessions = cache.get_behavior_session_table()
# only mice that have ophys experiments in the release
behavior_sessions['mouse_id'] = [int(mouse_id) for mouse_id in behavior_sessions.mouse_id.values]
behavior_sessions = behavior_sessions[behavior_sessions.mouse_id.isin(release_expts.mouse_id.unique())]
For all sessions with non-conforming session names, I'm saving new PKL files with conforming names here:
/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files
Thank you! These have been copied out to their production locations (and the originals renamed to ".bak".
Hi @dougollerenshaw 64 of the behavior sessions still have an unexpected session name. The one that I dived in to was in the list and did get an updated pkl file in your directory. The updated file did get copied out to the production location and the name matched the previous file. Not sure if there was a problem with the pkl update for these 64?
A file is at /allen/aibs/technology/waynew/behavior/behavior_only_nwb/20210220_behavior_only_unexpected_session_name.txt that has the behavior_session id and the 'offending' stimulus name as well as a list of the expected session names.
@wbwakeman could you share that information as a .csv instead of .txt? Might help to avoid similar issues as Doug had last time reading the other text file.
Yes, the csv version is in the same directory now: /allen/aibs/technology/waynew/behavior/behavior_only_nwb/20210220_behavior_only_unexpected_session_name.csv
The regex pattern templates are: 'OPHYS_0images', 'OPHYS[1|3]_images', 'OPHYS_2images', 'OPHYS[4|6]_images', 'OPHYS_5_images', 'TRAINING_0_gratings', 'TRAINING_1_gratings', 'TRAINING_2_gratings', 'TRAINING_3_images', 'TRAINING_4_images', 'TRAINING_5_images'
I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.
And one more set that is causing troubles. A different error but it appears to track back a missing stimulus name
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl
On this now. Will update shortly.
@wbwakeman I made a bit of a mess of this on my first pass. In addition to missing some files (as you noted above), I also misnamed some sessions in the new file.
So I tried to redo a bit more systematically.
Everything is now in the following directory: /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21
Here's what you'll find:
For any PKL files that are in this directory and for which you already copied over a new file, please use the file in this location to replace the previous re-written PKL.
Cool. Thanks @dougollerenshaw . Only one question. It looks like there are only 112 pkl files in the /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21 directory. From your comment, and from the spreadsheet, should I not expect 182?
Oh, I got it. There are duplicates. e.g.
180827141745_403491_d744c587-0130-45cb-92d5-398ae2b6fab8.pkl
180827141745_403491_d744c587-0130-45cb-92d5-398ae2b6fab8.pkl
@wbwakeman good catch. The duplicates must have come from reading in sessions both from Marina's code block and from your lists. I didn't think to check for duplicates. But after dropping duplicates, I was able to confirm that there indeed should have been only 112 PKL files in that directory.
Hi @dougollerenshaw I'm still getting an error with a lot of these. I tried to extract the value manually for one just to check it and got a unsupported pickle protocol
error. Here is what I did:
>>> import cPickle as pickle
>>> i = '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl'
>>> file = open(i)
>>> data = pickle.load(file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unsupported pickle protocol: 4
Are you able to extract a stage name from that file? (Any recommendation on a better snippet to check these?)
FWIW, this is the new file that was copied out yesterday. The old file is the .bak
. I CAN extract a stage name from the .bak file ('0_gratings_autorewards_15min')
-rwxrwxr-x 1 nobody 301 2439574 Feb 21 15:12 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl
-rwxrwxrwx 1 nobody 301 6018645 Aug 24 2018 /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl.bak
Sorry @wbwakeman! One more reason that we shouldn't be relying on PKL files for data storage! I saved with the pandas v1.2.2 to_pickle
method. This works for me for unpickling:
import pandas as pd
fn = '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl'
data = pd.read_pickle(fn)
data['items']['behavior']['params']['stage']
'TRAINING_0_gratings_autorewards_15min'
I don't have cPickle installed, but when I try loading with pickle, I also get an error:
import pickle
file = open(fn)
data = pickle.load(file)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-5-1db6b2b3868b> in <module>
1 import pickle
2 file = open(fn)
----> 3 data = pickle.load(file)
~/.conda/envs/vba/lib/python3.7/codecs.py in decode(self, input, final)
320 # decode input (taking the buffer into account)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
324 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Is it possible for you to load these with Pandas in pipeline? If not, should I try opening in Pandas and resaving with Pickle?
After this, let's never use PKL format again!!!!!!
@azcolin and @rhytnen: see above. These problems should be kept in mind as future effort is put into stimulus control software.
Thanks @dougollerenshaw . I need to track down something on our side. No need to resave those files.
I am interested in your opinion on these two comments: https://github.com/AllenInstitute/AllenSDK/issues/1903#issuecomment-782882964 https://github.com/AllenInstitute/AllenSDK/issues/1903#issuecomment-782883386
@wbwakeman:
I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.
Since it's both a 4x2
session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.
fn = '/allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl'
data = pd.read_pickle(fn)
data['items']['foraging']['cl_params']['user_id']
'sams'
@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?
Digging into one of the jobs that is still failing, I see this error:
'0_gratings_autorewards_15min' matched 0 pattern templates.the regex pattern templates are ['\\ATRAINING_0_gratings', .....
Which leads me to see that:
>>> data['items']['behavior']['cl_params']['stage']
'0_gratings_autorewards_15min'
>>> data['items']['behavior']['params']['stage']
OPHYS_0_grating_autorewards_15_min
For file '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl' I believe this is affecting a set of about 64 experiments, can provide that list if needed
@wbwakeman:
I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.
Since it's both a
4x2
session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?
We were testing WSE updates. Targeted_X and Targeted_Y variables were not getting set correctly in the platform_json on this session, which was part of a miscommunication about how those variables needed to be set. It was subsequently resolved. This is a test session that was never successfully uploaded due to the WSE errors.
I can also tell this is an Receptive Field Mapping stimulus session. All pkl files for Receptive Field Mapping sessions are different than the rest, since its a completely different type of stimulus than the other visual behavior scripts. Is it possible this looks like a different pkl structure because youre used to looking at the normal behavior sessions and not the RF mapping sessions?
@wbwakeman:
I'd also be interested in your opinion on this one: /allen/programs/braintv/production/neuralcoding/prod0/specimen_840544752/behavior_session_928942892/928783067.pkl Ophys session is "20190821_453990_4x2_test2" I think this may just be a test session that has no value for your analyses.
Since it's both a
4x2
session, which isn't part of this release, and since it's labeled 'test', I think we can safely ignore this one. But just out of curiosity, I opened up the PKL file and it turns out that the format is different than any I've previously seen. Exploring a bit, it looks like @samiamseid was the operator.@samiamseid Any idea what this was? Is this a file format we should be expecting to see more of?
We were testing WSE updates. Targeted_X and Targeted_Y variables were not getting set correctly in the platform_json on this session, which was part of a miscommunication about how those variables needed to be set. It was subsequently resolved. This is a test session that was never successfully uploaded due to the WSE errors.
I can also tell this is an Receptive Field Mapping stimulus session. All pkl files for Receptive Field Mapping sessions are different than the rest, since its a completely different type of stimulus than the other visual behavior scripts. Is it possible this looks like a different pkl structure because youre used to looking at the normal behavior sessions and not the RF mapping sessions?
Thanks Sam! Actually, yes, this does look a standard ophys_7 session. I think I may have been looking at a deeper level in the dict earlier when I said it looked unfamiliar. Sorry for the false-alarm. Regardless, @wbwakeman, we should ignore this file for release.
@wbwakeman, for the 6 files you linked above:
And one more set that is causing troubles. A different error but it appears to track back a missing stimulus name
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl /allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl
They are included in the re-saved session PKL files I linked yesterday. All 6 are behavior/ophys sessions. I ran this code block to confirm:
import os
import pandas as pd
pkl_list = [
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl',
]
new_pkl_path = '/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21'
for ii, pkl_path in enumerate(pkl_list):
fn = os.path.split(pkl_path)[1]
data = pd.read_pickle(os.path.join(new_pkl_path, fn))
s1 = data['items']['behavior']['params']['stage']
s2 = data['items']['behavior']['cl_params']['stage']
print("item {} in list\n\tfn = {}\n\tdata['items']['behavior']['params']['stage'] = {}\n\tdata['items']['behavior']['cl_params']['stage'] = {}".format(ii, fn, s1, s2))
Which gave me this:
item 0 in list
fn = 773079706.pkl
data['items']['behavior']['params']['stage'] = OPHYS_0_images_A_habituation
data['items']['behavior']['cl_params']['stage'] = OPHYS_0_images_A_habituation
item 1 in list
fn = 773939589.pkl
data['items']['behavior']['params']['stage'] = OPHYS_0_images_A_habituation
data['items']['behavior']['cl_params']['stage'] = OPHYS_0_images_A_habituation
item 2 in list
fn = 774747386.pkl
data['items']['behavior']['params']['stage'] = OPHYS_1_images_A
data['items']['behavior']['cl_params']['stage'] = OPHYS_1_images_A
item 3 in list
fn = 181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
data['items']['behavior']['params']['stage'] = OPHYS_4_images_B
data['items']['behavior']['cl_params']['stage'] = OPHYS_4_images_B
item 4 in list
fn = 181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
data['items']['behavior']['params']['stage'] = OPHYS_6_images_B
data['items']['behavior']['cl_params']['stage'] = OPHYS_6_images_B
item 5 in list
fn = 791539630.pkl
data['items']['behavior']['params']['stage'] = OPHYS_4_images_B
data['items']['behavior']['cl_params']['stage'] = OPHYS_4_images_B
Digging into one of the jobs that is still failing, I see this error:
'0_gratings_autorewards_15min' matched 0 pattern templates.the regex pattern templates are ['\\ATRAINING_0_gratings', .....
Which leads me to see that:
>>> data['items']['behavior']['cl_params']['stage'] '0_gratings_autorewards_15min' >>> data['items']['behavior']['params']['stage'] OPHYS_0_grating_autorewards_15_min
For file '/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_742008131/180824145120_403491_dcc5955c-2ea7-4408-997f-bb4b48c47e9b.pkl' I believe this is affecting a set of about 64 experiments, can provide that list if needed
@wbwakeman I iterated through all PKL files in that directory and changed the data['items']['behavior']['cl_params']['stage']
key for every file in which it didn't match the data['items']['behavior']['params']['stage']
key using the following:
saveloc = '/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/updated_pkl_files_2021.02.21'
pkls = [fn for fn in os.listdir(saveloc) if fn.endswith('pkl')]
for ii, pkl in enumerate(pkls):
data = pd.read_pickle(os.path.join(saveloc, pkl))
s1 = data['items']['behavior']['params']['stage']
s2 = data['items']['behavior']['cl_params']['stage']
if s1 != s2:
print("saving new PKL for item {} in list\n\tfn = {}\n\tdata['items']['behavior']['params']['stage'] = {}\n\tdata['items']['behavior']['cl_params']['stage'] = {}".format(ii, pkl, s1, s2))
data['items']['behavior']['cl_params']['stage'] = s1
pd.to_pickle(data, os.path.join(saveloc, pkl))
They should all be good to go now. Let me know if there are any more issues!
Thanks @dougollerenshaw I will check those out.
For the set of six, I believe the issue is that they do not have 'ophys_experiment' records in LIMS. So processing was interrupted in such a way that the data never even made it to LIMS. Makes me wonder if there was any behavior worth recording for those. Are you able to tell from the pkl file whether there was any stimulus and response? If so, I'll figure out how to save this, but if they are just junk, then I'll figure out how to exclude them
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl',
'/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl',
@wbwakeman for these 6, four have a substantial number of licks, so I'd assume they are reasonable sessions. One has only 1 lick and the other 0, so those two could have been failed for behavior reasons.
from visual_behavior.translator.foraging2 import data_to_change_detection_core
for pkl_file in pkl_list:
data = pd.read_pickle(pkl_file)
core_data = data_to_change_detection_core(data)
print(pkl_file)
print('stage: {}'.format(core_data['metadata']['stage']))
print('number of licks: {}'.format(len(core_data['licks'])))
print('')
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_773585522/773079706.pkl
stage: OPHYS_0_images_A_habituation
number of licks: 3450
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774106776/773939589.pkl
stage: OPHYS_0_images_A_habituation
number of licks: 4757
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_774937388/774747386.pkl
stage: OPHYS_1_images_A
number of licks: 5969
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_782380778/181119152241_403491_0991ea0b-9e8a-47cc-b74c-c34f092e2da7.pkl
stage: OPHYS_4_images_B
number of licks: 1
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_786920476/181128094417_403491_ced6aa12-5730-4f90-a0f7-7ea9f40b4963.pkl
stage: OPHYS_6_images_B
number of licks: 0
/allen/programs/braintv/production/visualbehavior/prod0/specimen_722884882/behavior_session_791749028/791539630.pkl
stage: OPHYS_4_images_B
number of licks: 3683
Describe the bug the behavior sessions table from the BehaviorProjectCache(from_lims) includes cases where session_type is unexpected (needs to be renamed) or not defined (NaN).
In the above example, release_expts is the list of ophys experiments that are slated for release. The intention is to release all behavior only sessions for the mice that have ophys data in the release, which is why I filtered the behavior_sessions table accordingly.
There are two issues with the session_type values in the resulting list of behavior only sessions. First, 129 of them are NaNs. Second, some of them need to be renamed. Specifically,
'0_gratings_autorewards_15min', '1_gratings', '2_gratings_flashed', '3_images_a_10uL_reward', '4_images_a_handoff_lapsed', '4_images_a_handoff_ready', '4_images_a_training',
should map to:
'TRAINING_0_gratings_autorewards_15min', 'TRAINING_1_gratings', 'TRAINING_2_gratings_flashed', 'TRAINING_3_images_A_10uL_reward', 'TRAINING_4_images_A_handoff_lapsed', 'TRAINING_4_images_A_handoff_ready', 'TRAINING_4_images_A_training',
Let me know if you would like a list of behavior_session_ids that are NaNs or have one of the unexpected session_types