Closed dylanmcreynolds closed 3 years ago
@dylanmcreynolds I thought we had decided to call this "Data Session"
@stuartcampbell hmmm, I didn't take notes and it came up as auth_session in subsequent conversations. I'm game to make it whatever, though this is the right time, since I have it in four repos right now (two in splash, this one and suitcase-mongo.) You want I should change it?
For what it’s worth “data session” is what I remember us converging on too. There was something about maybe wanting “auth” for other uses, and this being particularly about who is authorized to see this data? But I have no strong views and will happily click the green button once a choice is made.
OK, I'll make this change tomorrow: data_session
Is the order in the list meant to be meaningful? It may be better to spell this as a dictionary that is {'naming_authority': "token"}
so we do not have to search through a list going "does this look like something I would have handed out to identify a data session?"
No, I was not envisioning order being significant. Yes, if that's important, we'd have to go to a dictionary.
I was going for simplicity and easy indexing in Mongo. I know that there are other serializations possible, but the use case I am concerned about is something along the line of "search for runs that a currently logged in user would be able to view." And example might look like:
data_session: ["als", "bl832", "proposal1234"]
A query based on user access would return if the user had access to any of those data sessions. Someone enjoying privileges to see all data at ALS, someone enjoiying privileges to see all from that beamline, etc.
If we go with a dictionary, would that look like?
data_session: {
"als": "foo",
"bl832: "bar",
"proposal": "foo2"
}
Is it possible to index Mongo on keys is a dict?
The trouble is then, what is the key in the dictionary for the data session itself?
If you want to specify a 'group' of people that are not associated with a facility, beamline, proposal or (e)SAF what would go in this list or dictionary?
@stuartcampbell In this PR, I am not proposing enforcing any governance on what can go in to the list or dict. If beamline wanted to give access to their cats, it would be on them to decide that "catz" is a valid group to add. Or am I seriously misunderstanding your question?
I posted a summary of a conversation with @stuartcampbell and @cryos but before I learned that, after that conversation, @dylanmcreynolds and @stuartcampbell came to an understanding. I have deleted my comment to avoid muddying the waters. I don't have strongly-held views or deep experience in this area. I'm just aiming to un-block progress. :+1:
So, following up, @dylanmcreynolds and I had a chat this afternoon and cleared up some confusion that I had with thinking there was an implied metadata storage in the items in the list, but now I know its just a list of 'groups' that are allowed access to this data then I am happy. So, I have no strong preference for a string or a list - and hence if @dylanmcreynolds has use cases for a list, then I am happy with that.
I committed the change to "data_session" from "auth_session". As far as I know, this PR is good to go.
Forgive me for one last round of questions here. Changes to this schema are hard to back out of and I want to make sure we get this right on the first try.
When we chose the name "data session" I think we had in mind a unique ID (something like a "visit" identifier, not a globally unique ID) as the value. The unique ID would have meaning to some external system that would map it to proposals / access groups / users.
This proposal writes the groups directly into the document, effectively removing a layer. Both approaches seem valid to me, and it may make sense to choose one or even both depending on the use case. Stroring a list of groups is more direct and simpler in the case where the documents remain under the management of a system that understands those groups. Storing a unique ID better enables the use case where documents may be accessed or moved between uncoordinated systems. For example, proposal12345
may mean something different at NSLS-II than ALS.
I propose that we support both in the document model: an optional unique ID for the session and an optional hard-coded list of groups that can access it. We could worry about the unique ID field in a future PR, but I bring it up now because I think the name "data session" fits better for that---for our original concept of data_session: unique ID
. This PR has evolved into something that I think we might better call "data groups" or "access groups".
Edited: We had in mind a unique ID, not a UUID.
As discussed with @danielballan, I separated data_session (a string) and data_groups (list of strings).
For the record: this is exactly in line with the discussion that took place during the DAMA group meeting on Monday involving @stuartcampbell, @cryos, @tacaswell, and others present, so I will take the liberty of merging it. Thanks for your patient attention to detail and group consensus here, @dylanmcreynolds.
Adds auth_session field to start document schema
Description
The optional auth_session field is intended to be a place to store filtering information that can be used by downstream systems when making decisions about authorization / privileges regarding the run.
Motivation and Context
While this could be done by individual beamlines, it is desirable to make this part of the standard event model. The next step after this will be to go to the suitcase repos and index this field so that downstream processes can query it efficiently.
I made this a list of strings because the same run might have multiple known contexts. I had in mind "beamline A" and "proposal B" as distinct auth filtering contexts that could benefit.
How Has This Been Tested?
New module test_auth.py includes tests for valid and invalid auth_sessions.