datopian / data-api

Next generation Data API for data management systems including CKAN.
https://tech.datopian.com/data-api/
MIT License
9 stars 3 forks source link

[epic] Read Service with CKAN v2 compatibility #1

Open anuveyatsu opened 3 years ago

anuveyatsu commented 3 years ago

We are using Hasura as the base wrapper around Postgres.

We need a backwards compatible API

# dapi-endpoint includes version e.g. data-api.ckan.com/v1/ ...
{dapi-endpoint}/{dapi-id}/graphql  # raw Hasura
{dapi-endpoint}/{dapi-id}/datastore_search  # CKAN DataStore /api/3/datastore_search

# let's see about this one (if hard we may skip for now)
{dapi-endpoint}/{dapi-id}/datastore_search_sql/  # CKAN DS SQL /api/3/datastore_search_sql

Terminology

Acceptance

BONUS

Tasks

Analysis

graph TD

subgraph Read API service
  hasura[Hasura]
  apiwrapper[NodeJS wrapper app]
end

postgres[Postgres]

apiwrapper --> hasura
hasura --> postgres
rufuspollock commented 3 years ago

@EvgeniiaVak @leomrocha can you update on where we are with this?

leomrocha commented 3 years ago

@rufuspollock At the moment we have a basic API (as explained in the README file) that is not backwards compatible. It can take graphql queries and queries with other parametrizations.

We could make a few changes to have backwards compatibility adding endpoints that behind the scenes convert the calls to JSON, or doing a conversion in the same endpoints but I would suggest we keep the new API and deprecate the CKAN one adding specific compatibility endpoints giving some deadline (a few years?) to move to the new ones.

The datastore_search_sql endpoint should be deprecated a directly exposing the DB has many problems for portability and maintenance.

Not all the endpoints make sense with the new syntax as some can be merged.

In the case of making a GraphQL query, the server redirects to graphql

Downloads for CSV and XLSX should be added but I wouldn't do that as a graphql response (endpoint), instead what I would do is to maintain the graphql query and as response do a file stream directly, or saving to a file server with some file retention policy and returning the file url as a redirect and download.

leomrocha commented 3 years ago

@rufuspollock @anuveyatsu @EvgeniiaVak I would like your thoughts here.

This issue might be already outdated and some discussion is needed on the CKAN backwards compatibility.

I would like to avoid exposing the datastore_search_sql endpoint as it not only makes another security risk headache but I think we should deprecate it. Maybe on the services that are still using this endpoint keep using CKAN while the systems are migrated to the new GraphQL and datastore_search endpoints.

For the complete backwards compatibility of datastore_search with CKAN there are still some things to implement that with the changes make no real sense or are not easily feasible. If is needed for some current migration we can implement the needed features, but I would make it as per need basis only and not as a general requirement. The same if a migration can be done with the datastore_search_sql endpoint, I would not include it in the data-api distribution but only do it in the cases that need it for migrating from CKAN and with the view of taking it out in the future.

I think one of the most important features to do right now is to implement access rights and security.

rufuspollock commented 3 years ago

I would like to avoid exposing the datastore_search_sql endpoint as it not only makes another security risk headache but I think we should deprecate it. Maybe on the services that are still using this endpoint keep using CKAN while the systems are migrated to the new GraphQL and datastore_search endpoints.

I hear you .... and some people do use this so we will need it. I think for this we could implement a function purely in nodejs wrapper (outside Hasura). It's a relatively straightforward to do i think.

For the complete backwards compatibility of datastore_search with CKAN there are still some things to implement that with the changes make no real sense or are not easily feasible. If is needed for some current migration we can implement the needed features, but I would make it as per need basis only and not as a general requirement. The same if a migration can be done with the datastore_search_sql endpoint, I would not include it in the data-api distribution but only do it in the cases that need it for migrating from CKAN and with the view of taking it out in the future.

Can we make sure what is and is not implemented is very clearly documented in the README - is that something we could do right now.

I think one of the most important features to do right now is to implement access rights and security.

Can you open a separate issue about this if we don't have it already. For this i'm anticipating that we move to the JWT approach using ckanext-authz i.e. ckan using ckanext-authz gives out tokens that this data api service can then use to authorize requests.