SwissDataScienceCenter / renku

Renku provides a platform and tools for reproducible and collaborative data analysis.
https://renkulab.io
Apache License 2.0
223 stars 34 forks source link

Securing KG API #672

Open jachro opened 4 years ago

jachro commented 4 years ago

As a Renku owner, I'd like the knowledge-graph service API is secured :)

This epic is a starting point to some conversation about how to protect the KG API.

Once all the following issues are done:

SwissDataScienceCenter/renku-gateway/issues/168, SwissDataScienceCenter/renku-gateway/issues/169, SwissDataScienceCenter/renku-gateway/issues/170

Renku will start serving KG API to the world. All the endpoints will be secured by the Gateway verifying the access token and either returning 401 Forbidden or redirecting to the knowledge-graph service. However, securing the knowledge-graph API seems to be a bit more complicated topic as there are various aspects of protecting the data not just access to the endpoint.

ableuler commented 4 years ago

Question: do we want to make the access token mandatory for all the data or just related to private/internal projects?

I would prefer to not make it mandatory. This will allow us to offer some functionality to the logged-out user too.

Should we use the same model as GitLab and simply list all the project except those which we have no access to?

I haven't thought this through, but intuitively I would like to treat inter-project links as public information. For example, it could be ok to show the information that a public project X has used a dataset from a private project Y which has also been used in another private project Z. At the same time I would like to keep metadata about the private projects involved protected (name, author, etc).

ciyer commented 4 years ago

Can we make a few example scenarios here? I think we just need to distinguish between projects that are visible and invisible to the user. Projects are visible if the user has access to them and invisible otherwise. If we use A -> B to mean that B uses data from A and Visible[i] and Invisible[i] for the ith visible and invisible project, respectively.

I think the scenarios we need to handle are:

  1. Invisible0 -> Visible0
  2. Visible0 -> Invisible0
  3. Visible0 -> Invisible0 -> Visible1
  4. Visible0 -> Invisible0 -> Invisible1
  5. Invisible0 -> Invisible1 -> Visible0

What should the user see in the KG in these situations?

Does the user see that there is an invisible project? And if there is more than one invisible project in the path, does the user learn that they are different projects (that Invisible0 != Invisible1)?

I can imagine circumstances where each of the possibilities make sense:

ableuler commented 4 years ago

I can imagine circumstances where each of the possibilities make sense:

  • Invisible projects are shown, but without names
  • Invisible projects are shown, but not distinguished
  • Invisible projects are now shown

I'd have the tendency to show them in indistinguishable manner per default. What would be scenarios for you where