AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
343 stars 149 forks source link

Add project_code to a couple new locations #2541

Open DowntonCrabby opened 2 years ago

DowntonCrabby commented 2 years ago

Describe the use case that is addressed by this feature. The 'project_code' provides valuable information about experimental variations such as which image sets were shown, which visual areas were targeted, etc. This information currently only exists in some of the cache summary tables but should be included in several other locations.

Describe the solution you'd like add 'project_code' to the following locations:

Describe alternatives you've considered potentially adding it to the metadata attribute for the neuropixels ecephys dataset objects as well (depending on what @corbennett ) thinks, but this would better coordinate the metadata attribute across the two separate platforms

Additional context None

morriscb commented 2 years ago

Hey @DowntonCrabby and @matchings, I'm running into a issue here with adding the project code. The sessions associated with a ecephys or ophys session id are easy enough to grab as they are in those respective tables. However, I haven't be able to find a consistent way linking a behavior_session_id directly to project_id. The foreign ids within the behavior_sessions don't seem to give me the linkage that I need. I attempted to use donor_id to join with the project, but this gave me non-unique matches where I ended up with project_ids that disagreed with those in the ophys/ecephys tables.

Is there a clean way to associate a behavior_session_id on LIMS with a project_id? If not, is there data somewhere that contains information about the behavior session and includes the project_id/code?

matchings commented 2 years ago

a given mouse_id / donor_id should always be part of the same project_code... Are you saying there are behavior_sessions from mice with the same donor_id that are associated with multiple project_codes (thus causing non-unique matches)? If so, if you could share the mouse_ids for those, i might be able to track down why that has happened (perhaps an ophys operator assigned some sessions to the wrong project_code). But it really shouldn't happen.

morriscb commented 2 years ago

I'm more assuming that something is up as the project_ids don't line up when using the donor_id vs using the ophys_session table. My assumption was that this was how this was happening as the different project codes looked to be from a Development version of, what I assume, is the same project. :et me see if I can get a mouse id for you.

morriscb commented 2 years ago

Here's the query I ran to get the attached data. My copy of pgAdmin is acting up and I don't have a query off the top of my head to change donor_id into mouse_id.

select bs.id as behavior_session_id, 
       bs.donor_id,
       sp.project_id as specimen_project_id,
       os.project_id as ophys_sess_project_id
from behavior_sessions as bs
join ophys_sessions as os on (bs.ophys_session_id = os.id)
left join specimens as sp on (bs.donor_id = sp.donor_id)
limit 100;

Here's the result: project_id_comparison.csv

For the ids returned: 589723633's project code is VisualBehaviorDevelopment, and 767756009 has the project code VisualBehavior.

DowntonCrabby commented 2 years ago

Taking a look at this and I think a lot of these might be old. I remember at the beginning of the brain observatory we would take mice that weren't 'ideal' for imaging and switch them to be for developing visual behavior.

I'll do a little more digging tomorrow on this to see if these are all old and/or if they are from passes & published sessions or if they're failed.

I agree that this shouldn't be something that's an option in the future- but that would probably need to be implemented on the MPE / workflow sequencing engine end of things. It would have to ensure that the session created is aligned with the mouse's project code in lims.

morriscb commented 1 year ago

Hi @DowntonCrabby, @matchings, and @corbennett. I've got a bit of an update on this issue.

The LIMS team is working on adding the project_id to the behavior_sessions table. I think we understand the issue now a bit better. Related to what @DowntonCrabby said above, the issue seems to be that mapping the donor_id into the specimens table produces results for project_id that are different from the project_id that is stored in the ophys_sessions table. When I checked behavior sessions against ecephys sessions this also can also sometimes happen. What I think is happening here is that the project_id stored ophys/ecephys sessions is set separately from the when the project_id is associated with the specimens table and donor_id. Hence the discrepancy between the two projects when, for example, what @DowntonCrabby described happens.

To resolve this I need to ask which project_id should we consider correct? I'll list some cases below and a question regarding which action we should take.

Please, let me know what y'all would like to implement. Thanks!

matchings commented 1 year ago

@morriscb I think we might need a bit more insight from @samiamseid or @shiellac aboit how mice / sessions are assigned to project codes as part of the mouse-seeks / lims workflows for moving mice through the pipeline. They would have a better understanding of the types of situations that @DowntonCrabby described above and what should be considered the “correct” project code.

For example, when a mouse is first “created” in lims, I assume it is assigned a project code at that time. Perhaps this is the one that goes into the specimens table? Then when a mouse enters ophys/ephys, perhaps it can be moved into a different project code because we changed what project the mouse would be used for.

If that is the case, then I think we should go with the second option @morriscb described. The project_code associated with the ophys/ephys sessions should be the “final” designation of project_code for that mouse and override any discrepancies with the specimens table, because that is the project_code that represents the final experimental protocol that the mouse was exposed to (project_code is essentially shorthand for what experimental procedure a mouse went through).

I also agree with @DowntonCrabby that we should put some process into place that prevents a mouse from being associated with more than one project code. If the project code for a mouse changes at some point in its lifetime, the records in lims should be updated so that only the final project code is listed, and that it is unified across tables.

morriscb commented 1 year ago

Okay, I'll see if I can't test some of the ophys_sessions for VBO and see if for a given donor_id, they map to a unique project.

shiellac commented 1 year ago

@matchings

1/ ophys WSE assigns an ophys session projectID based off of the ISI projectID associated with the animal. 2/ The ISI projectID can be different from the specimen projectID if the animal switches taskflow/project. Moving forward, we can make sure that the ISI projectID is consistent with the specimen projectID. 3/ if an ophyssession is for dev/testing, we will change just the projectID for the specific ophys session to dev/testing without having to change the ISI projectID. this is for instances where an animal has a full production container but we run a session or two after as testing (not necessarily part of the production container). 4/ ophyssessions that are simultaneous behavior sessions should use the projectID assigned to said ophys session projectID

morriscb commented 1 year ago

Hey folks, looks like the latest LIMS update has added project_id to the behavior_sessions table so we can start working on adding this. One thing I was just told however, was that when this was added to the OphysExperimentMetadata, Wayne was very adamant that the project code should not be exposed to outside users, only internal. It's not clear to me why this was the case, does the project code link back to any sensitive information perhaps?

I'm hesitant add this data in since Wayne against it so firmly. Would an acceptable comprise be adding the project_id to the various tables and internal users can than change that into a project code by querying LIMS?

matchings commented 1 year ago

@morriscb im confused, the project_code has always been in the externally facing ophys_experiment_table. Users rely on it to distinguish different datasets. I’m pretty sure Wayne knew that. We definitely need it to continue to exist there, and to add it to additional places.

It’s unfortunate that there isn’t a more user friendly naming scheme to identify different datasets, since the project code does originate as a mostly internal thing, but it’s already baked in to the releases data and is a meaningful and important piece of metadata.

DowntonCrabby commented 1 year ago

So I think there's a few things happening here and I think there is a way to compromise. I think this is related to issue #2591

Project code itself is not meaningful to external users, HOWEVER, the project code is currently a proxy for other information that IS useful to external users- like the number of different cortical areas associated with an imaging session, and the number of depths per cortical area associated with an imaging session. There might also be other things for neuropixels that are essentially imbedded in the project code that I dont know about that maybe @corbennett could give some feedback on.

For ophys, the number of cortical areas and depths per area is really what the external users need/want to know and they currently don't have access to that information independent of the project code.

Here's the information that know of for ophys that is important: image

I think if we could add that information to the cache tables, and also to the metadata attribute, that would probably be sufficient, and actually more straightforward/explicit, but maybe @matchings has some more thoughts on that.

DowntonCrabby commented 1 year ago

I think long term there's some better ways that we can imbed this information into the pkl file or something like that, that isn't as opaque as project code, but I do think that will require some changes from other teams as well.

morriscb commented 1 year ago

I was initially confused as you were @matchings. I'm agnostic to including the code and, as you point out Marina, it's already public in the ophys_experiments_table. The comments from Wayne previously are what gave me pause. Here's an example from an issue from last year https://github.com/AllenInstitute/AllenSDK/issues/1889#issuecomment-785265909 There are a few others on our slack where he is even more firm about not including the code (which, again, ended up being released anyway).

@DowntonCrabby I currently have started on a version that is storing the project_id which would provide ability to differentiate projects from one another. I can switch this to project code if we end up deciding it is "safe" to do so. I mostly just want to make sure that we aren't exposing something sensitive. The only thing I can think of is that maybe Wayne was worried about exposing a billing code associated with the project? I'm not that familiar with what I could be unless the two of you have any ideas or remember something brought up in the issue thread I posted above?

matchings commented 1 year ago

@morriscb just so I understand, are the values of project_code and project_idessentially the same, it’s just a different name for the column? For example, does VisualBehaviorMultiscope4areasx2d appear in both?

If it’s just a difference in column name, I think it should be fine to use project_id. If the values are fundamentally different, I need to know what the difference is before signing off on anything.

morriscb commented 1 year ago

The value is changed to the integer id representing the project, as in the id column of the projects table. You can then look up the project name/code from the projects table in LIMS.

matchings commented 1 year ago

@morriscb ok I think we might have an issue then. For a few reasons -

1) if we only provide users an integer ID, then we need to provide another lookup table that maps that ID to a meaningful project description, which we can do but adds a semi annoying step for users, and people would end up wanting to create something more descriptive like the existing project_code anyway. It’s better to be explicit in naming via strings rather than using integers which are hard to mentally keep track of and parse.

2) the project_code with a string value describing the project has already been included in released datasets and we can’t take it away. People’s code depends on this, all our tutorials depend on this. For example, it would break the analysis code for the platform paper we are about to submit, which would force us to tie the paper to a previous version of the dataset and SDK because we don’t have the resources to refactor all our code to work with project_id.

Wayne may have had concerns about project_code in the past, but honestly we are way beyond that, it’s already released and being used. Given that Wayne oversaw those releases and was aware that the project_code was in the metadata tables, I think we can take that as a tacit approval / change of heart in his opinion on this.

If we really want to get away from project_codein the future, we can switch to a situation using project_id and some lookup table to map to a meaningful project description for future projects. As a compromise, we can put both the project_id and project_code in this release to start moving towards that system. But I’m not ok with taking away the project_code information that’s already available in the existing VBO datasets because it is already in use and is a very disruptive change to users.

morriscb commented 1 year ago

Hey Marina, okay, if project_id doesn't satisfy your needs we can change it to project_code. Looking over the release fix notes I sent in the email, it looks like no one batted an eye on it's inclusion in the previous release. If you can't think of a reason it shouldn't be used and since that no one seems to know the specific reason for Wayne not wanting it included, I'm happy to put the project identification in as project_code and call it done. Apologies for the back and forth, wanted to be careful given the what was previously asked of the team by Wayne. Gimmie a few days to get back around to it and we should have project codes on the 2.15.0 branch.

matchings commented 1 year ago

@morriscb No worries about the back and forth, I respect and appreciate your consideration of past team members concerns! If anything, this situation just highlights the importance of getting things right the first time - the cost of future changes can sometimes be very high, so its important to be careful with these things :)