CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
137 stars 32 forks source link

Allow fixing specUrl via environmental variable #351

Open lagivan opened 3 years ago

lagivan commented 3 years ago

It'd be a good improvement to allow binding grlc instance with a specification file and to prevent using any other specification file.

Now you can set a specification file with specUrl argument and, thus, anyone can create its own specification file and set of queries to run against the pre-configured SPARQL endpoint. It causes a security risk if the intention is to allow users to run only pre-configured SPARQL queries.

My use case is that we have an RDF store in private network but we want to build a public application interacting with the RDF store via grlc API. We want to expose only the pre-configured set of SPARQL queries to make application work. However, we don't want anyone to abuse the RDF store by providing their own specUrl.

The ideal solution would be the following:

c-martinez commented 3 years ago

Hi @lagivan -- very interesting use case. It is a bit different from how we have designed and used grlc so far. Usually we aim to share public queries (e.g. on Github) for accessing public RDF stores (e.g. DBPedia). Hiding an RDF store behind a grlc API is a kind of unexpected way of using grlc, so let see what would be the best way to get your use case to work.

One mitigating factor: an external person would only be able to reach your RDF store if they know its address in your internal network.

One question: will you build your grlc API from a spec file? from queries on Github? or from queries on your local storage? (https://github.com/CLARIAH/grlc#query-location)

@albertmeronyo -- what do you think? Do you think this is functionality that would be generally useful?

lagivan commented 3 years ago

@c-martinez we use a spec file now and we have to use specUrl argument so it's possible to see that from the client side.

You said an external person would be able to reach the RDF store by address - actually it's not possible in our case as RDF store is in the private network. We use our own grlc instance that has access to the RDF store and users can only access the grlc instance. So it's quite safe this way except that security breach that users can now provide their own spec file via the specUrl argument. With this feature implemented it will be possible to completely isolate the RDF store and expose only pre-configured API which is a great improvement in my opinion.