linking DRS with Passport Visa

briandoconnor commented 3 years ago

The following writeup is from @mattions , send to FASP on 11/19:

Dear all,

me and Kurt have been talking about a possible way to link DRS with DURI to solve the Authorization part from the DRS point of view.

This is clearly in the FASP world type of collaboration, but invests highly in both the DURI and the Cloud Wokstream.

So here the problem in a nutshell:

from a Client perspective:

If you get a DRSId, you have no idea how to authorize that. You know you need to send an Authorization Token, but which one, you do not know programmatically.

From a DRS Server developer:

If you stand up a DRS Server, you will need to also create a mapping from the DRSid and some sort of authorization mapping. How you connect the DRSId with the Authorization is via a Clearing House, which is usually done in an ad hoc way.

The things that have changed: 1) Passport is now an approved standard 2) Passport adoption seems to be extremely welcomed 3) DRS adoption is on the rise as well

We can now then try to link this, and use the problems above as an opportunity to bring these two together.

Here is a budding proposal:

We add to the DRS API first call response : GET /objects/{object_id} another metadata, which is a dictionary, that could look like this:

{
...
source_of_authorization: {
visa_name: "sadasdsadas",
visa_emitter: "https://example.org/ga4gh/visa_emitted/visa_managed"
}
...
}

the two entry in the data dictionary are:

visa_name this is a name for the visa (for example, in dbGaP world this could be the phs.xxxxx)
visa_emitter: this contains the URL that could bring back to the "authority" that has emitted this server for this data, and can have some specific information about the visa and things that are useful. Honestly here very happy to pass the ball to @Craig Voisin and @Rodarmer, Kurt Sr. (NIH/NLM/NCBI) [C] given that they are working on something on this

Ok. If you are still reading (thank you, and well done to you), now we have a good starting point:

Server Developer (DRS + Authorization): stands up the DRS Server and it has attached the Visa Emitter Server that is able to pair with it.

Client Developer: gets the DRSid --> checks if the user has the VISA(s) required in the passport, (this info is now readily available from DRS return call.)

if that is positive:

Sends all the DRSId to the Visa emitter (url set up in the DRS spec, in the name). This sends back a scoped token, which then can be used to be sent in the second call GET /objects/{object_id}/access/{access_id}

I would like to start a conversation on this and understand what you think about it.

Cheers, Michele

craig commented 3 years ago

Why do you pass the ball to me? I do not play.

jb-adams commented 3 years ago

Thanks to @mattions and @briandoconnor for posting this issue.

At the FASP call today (Jan 4, 2021), we discussed that this would be good to tackle during the January hackathon. The output of this could be a design doc that addresses how to resolve passports-based authorization from a DRS id / DRS metadata.

Looping in @jiaqi216 @ianfore @mbarkley @kwrodarmer @cdvoisin for their work on DRS and/or Passports. Please add any other DPs or implementers that may be interested in this hackathon topic and have valuable insights to contribute.

While the focus here is on Passports + DRS, whatever solution we propose will likely have implications across the GA4GH, especially regarding other API specifications (e.g. Passports + htsget). Ideally, we can design something that is readily adoptable by other API types (though this is not the focus of this item).

jb-adams commented 3 years ago

The above strawman proposal looks good to me so far. However, is there an assumption that the DRS object metadata endpoint will be unprotected? In the example:

GET /objects/{object_id} { ... source_of_authorization: { visa_name: "sadasdsadas", visa_emitter: "https://example.org/ga4gh/visa_emitted/visa_managed" } ... }

To me the ... indicates that full DRS object metadata is being returned, even prior to the client passing the correct visa token. Is this correct? If so, some DRS implementers may not want to yield object metadata without evaluating the correct token.

Would it be acceptable to return the source_of_authorization object in the body of a 403 Forbidden response? Ie. The client, not knowing which visa token to provide, first hits the endpoint. Then, receiving the correct visa_name and visa_emitter can retry the request, or give up if that visa name isn't in the user passport.

kwrodarmer commented 3 years ago

A passport provides a weak identification as a bearer token, signed by one of potentially several brokerages. It contains zero or more authorization tokens from potentially several original authorities, granting permissions with regard to some object or computing resource.

The use of a passport is not as straightforward as it might seem. It can be used as evidence of the bearer's identity, and for some levels of assurance this can be acceptable. But ultimately the condition we are trying to enable is one of research, where the authorizations become central. We can take the case of contacting a DRS 1.1 server as an example.

We assume that the researcher has identified the objects of study and holds them in a container of DRS ids, currently being called a selection for lack of a better term (in the SRA, we caved and called it a cart). The first question is, how does the researcher know what visas (authorization tokens) are required in order to access the objects? The one entity in our current schema that is required to know the relationship between visa and object is the DRS server.

The idea is that the selection object will be involved in creating a working, scoped-down passport. An appropriate broker will use the selection to contact DRS servers to obtain permission information (which is not the same as a visa identifier). This information can be used to either select visas off the broker's shelf or contact the original source of authority with an existing visa set to request a customized, down-scoped visa for the purpose. The broker then assembles the visas needed for the selection into a passport, signs it, and returns it.

It is this passport - the one that has been scoped down - that is passed to a work stream engine. Ultimately, it (or another passport derived from it) will be sent to the DRS server again, only this time to request access to the resource.

kwrodarmer commented 3 years ago

To me the ... indicates that full DRS object metadata is being returned, even prior to the client passing the correct visa token. Is this correct?

There is nothing protected about the metadata being returned here (a requirement, actually). I'm not sure what full DRS object metadata means: we are going to avoid overloading an existing message and introduce one that specifically operates with a set of DRS ids.

kwrodarmer commented 3 years ago

If so, some DRS implementers may not want to yield object metadata without evaluating the correct token.

My impression is that this is in response to the idea that the visa information would be added to existing metadata queries. I think I addressed that with the comment above.

mbarkley commented 3 years ago

It seems to me like there are two sub-problems being discussed in this thread:

How do I identify which GA4GH visas are required in order to successfully be authorized for a request to a DRS access method URL?
Once I have a passport containing sufficient visas to authorize access to a DRS object access method, how do I exchange this passport for an access token?

I'd like to temporarily put aside problem (1) and make the case that there are already existing OAuth 2 extensions that solve problem (2), and that it would be beneficial to use them.

Problem 2: Restatement of Solution

Here I'll restate problem 2 in terms of the desired solution and try to map it to ideas in the conversation thus far.

Given:

A research user
A web application that the researcher is using for accessing/analyzing DRS object bytes
A passport broker
A DRS server (with some objects accessible via it)
An authorization server that is able to validate permissions of research users and issue bearer tokens for one or more DRS access methods (important: this could also be the DRS server, but I will argue that we shouldn't require that!)
Some solution for problem (1) so that we can assume we know which visas a researcher needs in their passport to access a desired selection of files

Assumptions:

The web application, passport broker, and DRS server may be controlled by different organizations.
The DRS server and authorization server are controlled by the same organization

Desired use-case:

A researcher authenticates with the passport broker to get a full passport token with all the visas they need for some set of files (assuming some solution was already used for problem (1)).
The web application exchanges the full passport token with the authorization server to get a down-scoped token it can use for the desired set of files at the DRS server.
The web application uses the down-scoped token to call the DRS access method endpoints for the relevant objects.

This description of problem (2) intentionally loses some details that are particular to SRA (ex. it does not say that the token obtained from the authorization server is a down-scoped passport token and it does not define the concept of a selection). Nonetheless, I think this restatement of the problem and solution captures all the essential details that are relevant for a standard and would still allow for the particular solution that @kwrodarmer is describing for SRA.

Relevant OAuth Extensions

There are two relevant OAuth extensions for doing token exchanges:

RFC 8693 describes a standard way to exchange one token for another at an authorization server, where the tokens are potentially crossing authorization and organization boundaries
RFC 8707 describes a standard way to extend OAuth requests so that a client can indicate which resources it wants to access

How do these OAuth extensions solve the passport down-scoping problem? By using them together you have an existing standard that lets you exchange a full passport token for an access token to a particular set of DRS objects. In particular:

RFC 8693 describes how you can pass the passport token in the body of a token exchange request along with some authentication of the client application to obtain a token for accessing DRS object access methods
RFC 8707 describes how you can enumerate the set of DRS objects (i.e. the selection or cart) that you want to access with the resulting token -- this allows the authorization server to appropriately downscope the resulting token

Why Should We Use These Standards?

I think these standards are a good fit for our problem because:

Like passport, they are already OAuth compatible solutions
They focus on standardizing the contract between the components in different organizations; the API between the web application client and the authorization server
They allow flexibility for implementations in the right places; the token returned at the end of an exchange could be a down-scoped passport token or an opaque bearer token or some other kind of JWT -- that is an out-of-spec detail between the authorization server and the DRS server
These APIs both have small surfaces, especially when implementations already have some OAuth 2 support (token exchange is a single endpoint, and resource indicators is a single new query parameter for that endpoint)
The token exchange RFC can handle very large passport tokens; it requires tokens being exchanged to be in the body of a POST request, rather than in a header

If we decide to leverage these standards to solve problem (2), then we can focus our remaining bandwidth on what new standards need to exist to solve problem (1).

Potential Gaps

RFC 8693 and 8707 each have ambiguities or optional parts that can lead to two implementations being standards compliant but incompatible. We would likely need a passport-related document referencing these RFCs with specific guidance for certain choices.

Some examples I can think of right away include:

For indicating resources (RFC 8707), what URLs should a client use for DRS object access methods?
For token exchange (RFC 8693), should clients require an actor token as proof that they are the data clearing house that obtained the full passport being exchanged?

Conclusion

I would strongly encourage the folks working in this area to explore the two linked RFCs as a solution to the passport down-scoping sub-problem. There will likely still need to be some standardization of the optional parameters defined in these extensions, but this will likely be less work than creating an entirely new standard that is similarly robust.

jb-adams commented 3 years ago

thank you Max! this is a great, comprehensive restatement of the problem. I just have a few questions for clarification, some of these may be basic for those with more experience in Passports:

Desired use-case:

A researcher authenticates with the passport broker to get a full passport token with all the visas they need for some set of files (assuming some solution was already used for problem (1)).

The web application exchanges the full passport token with the authorization server to get a down-scoped token it can use for the desired set of files at the DRS server.

The web application uses the down-scoped token to call the DRS access method endpoints for the relevant objects.

In step (2), is the authorization server that accepts the full passport token and hands back the down-scoped token acting as the Passport Clearinghouse service?

Is the down-scoped token value exactly what will be passed to the DRS service as an OAuth 2.0 Bearer token?

RFC 8707 describes how you can enumerate the set of DRS objects (i.e. the selection or cart) that you want to access with the resulting token -- this allows the authorization server to appropriately downscope the resulting token

Is it correct to say that the RFC 8707-formatted payload of desired objects will ONLY be used in the request to the authorization service? It does not need to be passed to the Passport Broker service, right?

kwrodarmer commented 3 years ago

There are a lot of moving parts here. I will comment on @mbarkley's comment soon.

mbarkley commented 3 years ago

In step (2), is the authorization server that accepts the full passport token and hands back the down-scoped token acting as the Passport Clearinghouse service?

My interpretation of the spec is that the clearinghouse is the client that initiates the flow to get a passport token (i.e. the "Relying Party" in OIDC, or simply the "client" in OAuth2). In that strict sense, the answer is no, based on the scenario I described, but with a caveat. I think some people interpret a clearinghouse service as a service that inspects a passport whether or not it was involved in a passport token auth flow; In that sense the answer would be yes. I'm not sure how important this distinction is.

Is the down-scoped token value exactly what will be passed to the DRS service as an OAuth 2.0 Bearer token?

In the system I am advocating for, yes. My hypothesis is that this also aligns with the down-scoped obtained for a "cart" or "selection" in SRA, but I need @kwrodarmer's expertise to validate that.

Is it correct to say that the RFC 8707-formatted payload of desired objects will ONLY be used in the request to the authorization service? It does not need to be passed to the Passport Broker service, right?

I'm not sure I understand the question exactly, but I think you are asking if RFC 8707 needs to implemented by passport brokers? Perhaps for other reasons it could be useful, but in the context of the solution to problem (2) I'm describing, no. Only the authorization server accompanying a DRS server would need to implement RFC 8707.

kwrodarmer commented 3 years ago

I can't emphasize enough that there are at least 3 things that are intimately related:

selections
passports and visas
DRS servers

The importance of a selection is as a description of intentions. The OIDC model is quaint enough, and expects to ask a user to authorize scopes. In theory, scopes might be capable of describing access but it becomes very difficult in the world of big data. As an example, we would not expect a PI to be able to answer the question of which dbGaP consent groups they want to authorize for inclusion in a passport. On the other hand, they know exactly how to describe what they are researching. A selection object that captures intentions as the primary requirement is a much better way to scope the visas than an OIDC dialog. Furthermore, login time is neither powerful enough nor appropriate for resolving such questions, since login represents a point in time where the user gathers up all needed permissions; and these are not the same as the permissions to be extended to the many user-agents and user-agent-agents and user-agent-agent-agent-agents...

The DRS server is the only item within our current model that has intimate knowledge of the mapping between permission and resource. The v1.0 passport was conceived to carry permissions, and v1.1 DRS is conceived to map ids to URLs, but the AuthZ mechanism is what's being discussed.

To be continued...

kwrodarmer commented 3 years ago

The problem discussed with @mattions is that there is no clear mapping between DRS id and visa, or even the source of authority behind the visa. In today's framework, and if we assume that a DRS server works upon a passport for AuthZ (requiring it to be delivered by POST due to it having an unbounded size), then the DRS server is required to

recognize the resource designated by the DRS id
have intimate knowledge of the source of authority behind the visa
be able to recognize and validate the visa if present in the passport
be able to map between designated resource and permission within the visa, either directly or via an intimately trusted (e.g. private and internal) "clearinghouse."

When WES is executing, it will be given a passport from the user (not well-known to WES) containing the user's visas, plus some description of the work to be done and upon which objects (the selection). WES is responsible for mapping between the DRS ids and URLs, which today happens via DRS servers; the latter are identifiable through the DRS ids. At the point that WES is executing, the user is no longer involved in the flow except as an observer, and in particular is not available for injecting more authorizations into the flow. All needed authorization has to be present in the starting passport. If we consider operations that involve multiple DRS servers, this is likely to imply that the starting passport is overly permissive for any single one of them. WES (or an authorized broker) is in a position to partition a selection by DRS namespace, creating multiple selection objects. These may be combined with a starting passport in a call to a broker to obtain a new derived passport, scoped down to the intersection between the requested DRS ids and the permissions contained in the input passport.

mbarkley commented 3 years ago

When WES is executing, it will be given a passport from the user (not well-known to WES) containing the user's visas, plus some description of the work to be done and upon which objects (the selection). WES is responsible for mapping between the DRS ids and URLs, which today happens via DRS servers; the latter are identifiable through the DRS ids. At the point that WES is executing, the user is no longer involved in the flow except as an observer, and in particular is not available for injecting more authorizations into the flow.

That all makes sense. I think we are still describing compatible scenarios. The issue when WES is executing is that you want to exchange the "full passport" that is overly permissive for a single DRS server and too big to fit into a header, for a smaller token that is properly scoped for a single DRS server, right?

I think this is exactly the area where the token exchange OAuth flow can help. I did not emphasise this earlier, but the token exchange OAuth flow is a non-interactive flow; It does not require user interaction, as long as a valid token for a user (in our case, a full passport token) is already present.

In the scenario you describe, the token exchange flow defines a single POST request that the WES server could make to an authorization server. That request would exchange a full passport (passed in the request body) for a token (either a down-scoped passport or an opaque bearer token) that is usable at a particular DRS server. You could do this token exchange at a different authorization server for each respective DRS server being accessed.

In order to prevent a full passport token from being replayed by one authorization server at another, the token exchange request can additionally accept an actor token (a token where the subject is the service doing the exchange, not the user).

kwrodarmer commented 3 years ago

The issue when WES is executing is that you want to exchange the "full passport" that is overly permissive for a single DRS server and too big to fit into a header, for a smaller token that is properly scoped for a single DRS server, right?

There is truth in that. I just want to say that the "full passport" in this case is ideally already scoped down for the task at hand. The WES process may want to scope it down further if multiple DRS servers are involved. I do not want to imply that the WES process starts off with 100% of a user's visas.

kwrodarmer commented 3 years ago

I think this is exactly the area where the token exchange OAuth flow can help. I did not emphasise this earlier, but the token exchange OAuth flow is a non-interactive flow; It does not require user interaction, as long as a valid token for a user (in our case, a full passport token) is already present.

I have lots of concerns with the concept of original authority flow. I may be wrong - and maybe you can show the error of my thinking, but I have not found any OAuth2 flow that models original authority flow properly. The idea of downscoping passports is already within the scope of brokers, although there are cases (e.g. dbGaP visas) where no broker is in a position to downscope a dbGaP visa and would require going back to the source of authority (RAS) to obtain a rewritten visa.

kwrodarmer commented 3 years ago

In the scenario you describe, the token exchange flow defines a single POST request that the WES server could make to an authorization server. That request would exchange a full passport (passed in the request body) for a token (either a down-scoped passport or an opaque bearer token) that is usable at a particular DRS server. You could do this token exchange at a different authorization server for each respective DRS server being accessed.

What I said above about original authority is what concerns me here. I can clarify that at some point within the scope of a resource server, there will be a trusted set of cooperative services that have the authority to act in the name of the resource server. In this case, we will see any number of transformations from visa to physical access token. However, the authority model requires that such cooperative services have the legal authority (emphasis on the word legal) to operate on behalf of the RS. Put another way, they have to assume the same liability in court.

kwrodarmer commented 3 years ago

When I perform closer analysis of authority in the system, I do not find myself being happy with OAuth2. I am reminded that it was designed as a simple solution to a particular problem, but now people seem to want to sprinkle it on their cereal in the morning. I'm not sure that it always has a solution for every problem (and if it did, would there be a problem for us to solve right now?).

But anyway, there is authority to access information, and authority to access a resource. These are two different things. It would likely be a huge improvement in our system if we modeled them separately. I speak about dbGaP because I work for NIH, but I do not mean to imply that it is specific in any way. dbGaP/RAS/NIH has the authority to grant access to the data managed under dbGaP, but the files may be housed in S3 and/or GS, which are under a separate authority. What we do today is less than ideal, and involves generating a signed-URL from our DRS server that tells Google or Amazon to allow the bearer to access an NIH bucket at our cost.

If the two authorities were separate, then we would have the DRS server generate URIs to each resource server in their respective clouds, and these cloud servers would use a token generated by DRS (under RAS authority in our case) that grants access to the information, but access to the cloud systems and payment and the like becomes an issue between the cloud RS and the user.

a user logs in to an IdP to obtain a bearer token. This could be vastly improved, but we use OIDC which is so widely accepted and understood that it provides a benefit in spite of its insecurities.
we gather authorizations into passport tokens. Technically, the passport is bound to an identity, but is still a bearer token meaning that it does not provide any evidence of bearer identity. The embedded visas should be linked to the passport identity, but since the passport is not a valid identification of the bearer, the significance has to be carefully evaluated.
users, and even "clients" (which some people have identified as a sort of 'do-it-all' service for the researcher) do not necessary do anything with tokens. Instead, they pass the tokens on to other services where they will be used. This is formally known as a power-of-attorney relationship, where the PI who authenticated at login authorizes a 3rd party to use tokens in his/her name. We have not modeled this in any way within GA4GH, and it would help out a lot, since it is the basic operating stance.

If dbGaP were to convert our DRS server into a type of AS capable of handing out POA tokens for individual resource access, it would have similarities to the OAuth2 model, although in this case we would have to insist that the authority of the derived token flows from the original authority.

ianfore commented 3 years ago

Trying to keep up and will post thoughts shortly, but there's a fundamental assumption made that I want to understand better as I think it has some bearing on the problem a WES server has to deal with. Starting here

When WES is executing, it will be given a passport from the user (not well-known to WES) containing the user's visas, plus some description of the work to be done and upon which objects (the selection). WES is responsible for mapping between the DRS ids and URLs, which today happens via DRS servers; the latter are identifiable through the DRS ids.

I use DRS to find out where the data is and that on that basis decide where I want to to get the compute done on any particular file. Typically, rather than individual files, that would mean a number of sublists with ids destined for compute each at a different locations. FASPScript9 illustrates this. It constructs a single list of drs ids from a federated query across two sources of clinical data. As it processes that assembled cohort it passes each id off to different workflow servers dependent on the location of the data. The work requested of any workflow server is on data that is located 'close' to it. Close meaning any location that avoids lengthy download or egress charges.

In the examples I've seen It would mean that I don't have to give every WES server all my visas - just those for the local systems that they are being asked to compute on.

kwrodarmer commented 3 years ago

I use DRS to find out where the data is and that on that basis decide where I want to to get the compute done on any particular file. Typically, rather than individual files, that would mean a number of sublists with ids destined for compute each at a different locations.

Yes, this is sensible. I don't know enough about WES to know if it contemplates such a facility.

It constructs a single list of drs ids and passes each id off to different workflow servers dependent on the location of the data. The work requested of any workflow server is on data that is located 'close' to it. Close meaning any location that avoids lengthy download or egress charges

I envision being able to split a selection object on DRS, but your point is more important. However, it also involves user preferences.

In the examples I've seen It would mean that I don't have to give every WES server all my visas - just those for the local systems that they are being asked to compute on.

Yes, this is the goal. Reduce the authority carried by a working passport to its minimum at any particular stage. When a user starts your script, there may be a passport provided on entry that is used only by your script to create subsets of the permissions it carries, embodied as new passports. These would be used in further communications.

ianfore commented 3 years ago

Thanks Kurt

I don't know enough about WES to know if it contemplates such a facility.

Can someone tell us the answer to that? The way I've been using WES I pick (in my script/notebook) the WES server I want to use based on

where the files are that I want to compute on
where I have procured privileges to run a compute and store results
other considerations - like the platform the compute would be on It seems to me the last two would fit under what you referred to as 'user preferences'.

Yes, this is the goal. Reduce the authority carried by a working passport to its minimum at any particular stage.

I like this goal. In fact in my working examples I reduce the authority to not even to hand my token (which is pretty much the visa) to the workflow service. I give it the signed URL.

I am applying pretty much the same principle here as I did with my real Passport while on a business trip. The travel agent we were working with in the country we were visiting said it would be more efficient if we gave him out passports to take back to his office so he could get some necessary paperwork done. We declined, preferring to visit his office rather than handing over the authority.

All that goes to say - I'm not sure I want WES to contemplate the facility you referred to. I have an open mind to be convinced otherwise.

ianfore commented 3 years ago

Still trying to consume all this but addressing some issues raised as I go. @jb-adams wrote

To me the ... indicates that full DRS object metadata is being returned, even prior to the client passing the correct visa token. Is this correct? If so, some DRS implementers may not want to yield object metadata without evaluating the correct token.

One thing to distinguish here is what the spec says and what implementations are doing. The latter shouldn't be taken as definitive though it's usually worth finding out why they did what they did.

In one case an implementation was handing out a cloud bucket location (a URI not a URL) in response to GET /objects/{object_id}. This was erroneous and was fixed. No one could have accessed the object via the URI provided in any case. The object and bucket were protected by the authorization mechanisms of the cloud platform. Even so, reducing risk by eliminating a hole in the Swiss cheese is generally a good thing.

Other than that, yes in most live implementations object metadata is being passed in response to /objects/{object_id} without authorization. In the following I will refer to /objects/{object_id} more concisely as GetObject, as does the spec https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/docs/#_getobject.

One implementation (SevenBridges) does require authentication for GetObject . See SBDRSClient where it was necessary to override the base class GetObject method to pass the bearer token. I would suggest we spawn a specific issue ticket to resolve what the expected practice is.

Other than that... It seems reasonable that the type of object metadata being provided in response to GetObject can be provided without authorization. It would be good to know that proper risk analysis was done one that. That might get overridden by the risk analysis that an implementer, or deployer of an implementation, determines is necessary.

At the same time we might want to review a closely related issue. For 'public' data the DRS spec suggests that the access URL is provided in response to GetObject without authorization. At first glance that seems intuitively attractive - you can get all you need in one hit on the API. However, it complicates the logic that has to be coded by users of the API. Would it not be better to encapsulate this logic such that servers deal with it via 5.2. Get a URL for fetching bytes. The function seems well named for the purpose. If the data is public the task faced by the server to implement getaccessurl is simple.

kwrodarmer commented 3 years ago

the DRS spec suggests that the access URL is provided in response to GetObject without authorization. At first glance that seems intuitively attractive

As a heads-up, so to speak, I think we should look farther into the future and anticipate that "authorization" really isn't confined to content access control (confidentiality), but can also be authorization for use of resources. Even 'public' data may require authorization in the future (and I suggest that it become the norm).

ianfore commented 3 years ago

we should look farther into the future and anticipate .... authorization for use of resources.

That future is already here. I haven't got to it yet but I fully expect the SevenBridges WES implementation will require me to authenticate before I can run a workflow. That is certainly already the case for DNAStack WES. See https://github.com/ga4gh/fasp-scripts/issues/6 . Passport gets a wink.

Edited to confirm that I would used the same token for access to both the SB WES and DRS services See their excellent documentation.

ianfore commented 3 years ago

Finally got to trying out the WES implementation on the Seven Bridges CGC. See details at https://github.com/ga4gh/fasp-scripts/issues/9 . It's relevant to the discussion here because the references to the data objects used in the workflow are DRS URIs. It's an executable example where we can explore how a workflow gets access to an object.

mattions commented 3 years ago

We built a mockup for a possible solution on this document: https://docs.google.com/document/d/1lSRIJRFSIB8EMww_yOY6hWkT6O7jDABdmxegsmFBD24/edit#heading=h.k08y8z36zxac

briandoconnor commented 3 years ago

Applying the "due: feb" label since we want to be able to report back on the status during the 2021 GA4GH Connect meeting

briandoconnor commented 3 years ago

Can we get a champion for this ticket? Someone that can continue to move it forward by pinging folks on this thread and implementers?

briandoconnor commented 3 years ago

See the outcome of the FASP hackathon:

jb-adams commented 3 years ago

@brainstorm @victorskl here is our PoC design doc where we propose a mechanism for obtaining a down-scoped passport for a set of requested DRS IDs. Currently, we propose:

an endpoint on the DRS server that accepts a list of DRS IDs, and returns one or more broker endpoints that the client must contact.
the client then contacts each of the broker services with their root passport (assuming the client is already authenticated) and the DRS IDs the broker is responsible for, returning a downscoped passport with only the necessary visas.
The downscoped passport is the authorization token in the request to the DRS server for access to the bytes

As htsget is essentially a more file format-specific data retrieval API compared to DRS, it makes sense to harmonize approaches to token flow where possible, ie. similar endpoints and payloads for Passports + DRS as Passports + htsget. We'd appreciate any feedback you have on this from your hackathon work.

victorskl commented 3 years ago

As htsget is essentially a more file format-specific data retrieval API compared to DRS, it makes sense to harmonize approaches to token flow where possible, ie. similar endpoints and payloads for Passports + DRS as Passports + htsget. We'd appreciate any feedback you have on this from your hackathon work.

Thanks @jb-adams Yep; been reading/following this thread discussion for awhile. Understood the proposed PoC design doc "Mock up flow" for the downscoped passport Visa token for Passport + DRS authz case. Yes, we can experiment the proposed flow with our Passport + htsget POC setup and, feedback if any.

jb-adams commented 3 years ago

@uniqueg here's our open issue for DRS + Passports as well as our design doc.

ianfore commented 3 years ago

Added a note to the design doc, but posting details and links to examples here rather than bloating the design doc.

Forked Max's gist here with examples using valid drs ids, and likely scenarios.

Explanation of examples

metaresolver_post.json and metaresolver_response.json These are closest to the need we’re trying to fulfill i.e. to answer where and what kind of auth you need for a given id.

In my examples the drs_ids used are valid drs_ids with a namespace prefix (see 2 below)

There are three driver projects and three drs servers represented in this.

In the case of any one of these three– there are definitely files that fall under different authorization groups. In theory they might require different authorization servers. In practice we don’t yet need anything that complex. A single server can supply the authorization for all of the ids – at least for a given drs server.

metaresolver__host_uri_post.json In 1. I used the CURIE form of DRS id. In metaresolver__host_uri_post.json I added examples of the host based form of DRS ids. Though they use a host name these are still not URLs at which a service can be called. The DRS spec explains how to do that.
Metaresolvers, and finding out the auth needed a) The DRSMetaresolver class in fasp-scripts is an implementation of how one gets from either kind of DRS id to a URL at which a DRS service may be called. b) That seems to me tightly bound to the question of where and what kind of auth is needed for that server.

To date: DRSMetaresolver handles 3 b) locally. It instantiates different clients which deal with the server specific auth method.

Based on the proposed changes in Max's examples, DRSMetaresolver should be able to provide to a generic DRS client all it needs to authenticate against any DRS server Benefit: No more server specific clients

Determining if DRSMetaresolver could do that from the draft spec seems a useful to walk through.

Local vs global drs_ids – important to understand, but peripheral to the point anvil_post.json, crdc_post.json and sra_post.json show what a single DRS server can deal with

The point of the examples above is to show that when calling a DRS server the drs_ids are not namespace qualified. Some of the discussion has been loose about that – so I want to make sure that it’s illustrated and understood.

ga4gh / data-repository-service-schemas