atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

Add support for distinct() #579

Closed sirex closed 3 months ago

sirex commented 3 months ago

Add distinct() user function for external backends (SQL, XML, JSON, CSV).

This is mostly should be used, when transforming denormalized data into normalized form. For example:

d | r | b | m | property | type   | ref     | source           | prepare    | level | access
datasts/example          |        |         |                  |            |       |
  | resource1            | sql    |         | sql://example/db |            |       |
  |   |   | Country      |        | name@en | CITIES           | distinct() | 4     |
  |   |   |   | name@en  | string |         | COUNTRY          |            | 4     | open
  |   |   | City         |        | name@en | CITIES           |            | 4     |
  |   |   |   | name@en  | string |         | CITY             |            | 4     | open
  |   |   |   | country  | ref    | Country | COUNTRY          |            | 3     | open

With this manifest table, if we have following data:

CITY COUNTRY
Vilnius Lithuania
Kaunas Lithuania

Then for Country we need to explicitly declare, that we expect dublicates for model.ref, which in this case is country name, but we want to take only distinct values.

For this to work, we need to sort table by model.ref in order to apply distinct function, at least this is the case for SQL.

The end result for Country should be:

_id name@en
8e995369-6553-443b-b134-3162bc41d4f2 Lithuania

Epic

Depends on