CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Request for Comment: Proposal for forwarding data between RESTful webservices in a heterogenous authentication environment #22

Closed proycon closed 2 years ago

proycon commented 3 years ago

As one of ideas of the CLARIAH Interest Groups is to come up with best practises and proposals, I would like to kick this off for our workflow group with the following proposal.

You can view it in either markdown or PDF:

I'd be glad to hear any feedback and opinions on this.

jblom commented 3 years ago

@proycon looks interesting. Since I am so brainwashed with how authentication works within the CLARIAH media suite (which also calls APIs with authentication tokens based on federated login stuff with SATOSA) I would love to see an example of how you use this for 2 actual services in e.g. switchboard or CLAM, which I am not familiar with.

Anyway, I'm interested to figure out how this can be used for the media suite to make it more integrated with CLARIAH as a whole.

proycon commented 3 years ago

@jblom Thanks, good point, I could have included a more specific example. Consider for example the following workflow:

  1. A user uploads a plain text document to the CLARIN switchboard
  2. The switchboard suggests to the user a set of webservices that is suitable for processing this file.
  3. The user selects a webservice (say the ucto tokeniser, a CLAM based webservice hosted at Radboud University, which does not implement federated authentication)
  4. The switchboard makes the uploaded user-data available through a one-time download link with randomised component and redirects the user to the CLAM-based webservice, with the download link as an argument.
  5. The user is redirected to the CLAM-based webservice
  6. The user logs into the CLAM-based webservice
  7. The CLAM-based webservice obtains the input text from the switchboard (using the one-time link that was passed)
  8. The webservice does its thing (tokenisation in this case), and returns the output to the user.

I don't know much about the media suite, but to be able to receive files from webservices adhering to this principle, it would only need to be able to have some endpoint that takes a download link as argument, and when invoked it downloads the resource there. To be able to delegate its own output to other webservices in this scheme, it needs a mechanism to temporarily offer unauthenticated endpoints (with some random component), to access user data.

4tikhonov commented 3 years ago

Hi @proycon, it looks interesting. By any chance, do you know there is ongoing work in SSHOC WP3 related to the integration of CLARIN Switchboard and Dataverse data repository?

The prototype is available online, user should be able to authenticate and upload file and invoke different CLARIN services, please take a look here and click on "Process with Language Resource Switchboard" button. If user will exchange Dataverse Access Token with service, it should be possible to deposit back in Dataverse the output of the selected CLARIN service.

Regarding the workflow, I believe you need a bit more sophisticated solution if you're intended to create a reliable pipeline for acyclic processes. Currently the industrial standard for this is Apache Airflow, that's something DANS is going to investigate in a few projects. However Kafka could be also considered, especially in the light of Twitter adoption in order to repeat some failed actions when users requesting access to tweets via API.

proycon commented 3 years ago

By any chance, do you know there is ongoing work in SSHOC WP3 related to the integration of CLARIN Switchboard and Dataverse data repository?

No, I don't really know anything about that, you'd have to ask them.

The prototype is available online, user should be able to authenticate and upload file and invoke different CLARIN services, please take a look here and click on "Process with Language Resource Switchboard" button. If user will exchange Dataverse Access Token with service, it should be possible to deposit back in Dataverse the output of the selected CLARIN service.

The switchboard doesn't really capture the output of the services as far as I know, it merely delegates the user to a service, forwarding the data submitted to the switchboard. Of course the services in turn could deposit data back in the Dataverse, if it supports a kind of mechanism like proposed in this issue. This could be an interesting option I had not considered yet. It might be worth considering in the scope of the WP3 VRE plan if there is interest.

Regarding the workflow, I believe you need a bit more sophisticated solution if you're intended to create a reliable pipeline for acyclic processes. Currently the industrial standard for this is Apache Airflow, that's something DANS is going to investigate in a few projects. However Kafka could be also considered, especially in the light of Twitter adoption in order to repeat some failed actions when users requesting access to tweets via API.

This is indeed a fairly simple solution which won't serve all cases. More sophisticated solutions are probably called for in other contexts, but I'd rather start simple and move to more complex solutions only when necessary.

dgbroeder commented 3 years ago

Hi, some comments, also in response to Slava wrt activities in SSHOC WP3 Switchboard use-case

To @4tikhonov 's earlier remark about this being potentially a solution for one of the Dataverse / Switchboard integrations we discussed. I can only remember this one: A from Dataverse via the Switchboard invoked service processing result needs to be deposited back into Dataverse repository.

To realise this, adaptations to Switchboard seem minimal, the work goes into Dataverse and the to be invoked services. The Switchboard only needs to pass on the callback information to the invoked service.

Note that the Switchboard does not require any authentication, it just passes on information (maybe too simple:).

proycon commented 3 years ago

this delegation workflow requires the webservices (A but also B) to have a UI (or some logic to communicate with the user client) as do the CLAM services.

Yes, indeed, it requires an interface between the user and each service. This can be a web API that is accessible for automated clients, and for the end-user it is accessible through a user interface (which was the primary focus for this use case).

It also seems to require that the user client remains available to effectuate web service B start processing, and the user to reauthenticate, or is the flow asynchronous and will the user already be able to call webservice B if A is still processing? If its asynchronous, I see other problems e.g. how to deal with errors if the processing by A does finish and B remains pulling the data?

This is not asynchronous no, it is purely sequential. The user has to wait for output from A to be available before he can continue on to webservice B.

adaptations of all the involved services needed, but that may be acceptable within a single project as CLARIAH.

Yes, at least for those services that would want to connect with another in such a way.

and scaling up to N webservices, this workflow would require the user-client to remain active

Yes (though in the case of CLAM webservices that doesn't mean that the user has to keep his session active, the session will remain running in his absence)

I wonder if we can get others outside CLARIAH NL to buy-in to this solution, they may want to wait or work towards accepted standard based solutions.

My aim here was to keep things as simple as possible (I don't claim that this is a particularly novel solution either, it has probably been implemented by many in some form or another). Most other solutions will probably solve more problems but at the cost of requiring more common infrastructure.

on the positive side :) it does seem to solve delegation for specific use-cases of which we are in need since OAUTH2 based solution for the domain do not seem to be forthcoming.

Indeed, this bypasses the problem of the authentication environment since doesn't presuppose any common authentication solution in the first place.

proycon commented 2 years ago

I'm closing this issue as I'm not expecting much more feedback on this now. The work done in it may serve as a reference for forwarding data between RESTful webservices in a heterogenous authentication environment, without any explicit approval status.