datopian / data-api

Next generation Data API for data management systems including CKAN.
https://tech.datopian.com/data-api/
MIT License
9 stars 3 forks source link

[epic] Support for large queries from data API #19

Open rufuspollock opened 3 years ago

rufuspollock commented 3 years ago

When querying the data API I want to be able to make queries and get results with 100k or 1m+ results and download them so that I can extract the data I want even if larger

Acceptance

Analysis

There are 2 approaches:

  1. Stream the whole result
  2. Extract the query results to storage and give storage url to the user

One can also have hybrid e.g. do the former up to some number of results and then switch to 2.

There are several advantages of option 2:

The disadvantages (at least the small data sizes):

leomrocha commented 3 years ago

@rufuspollock The current implementation Streams the results, we are testing the performance in some testing (but distributed) environments.

Tests wen't right and now it seems that any extra optimization needs to be in the Hasura and Postgres queries and views.

For further improvements, the current issue is the next step, but due to time limitations we'll not start with this for the moment.