atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

GROUP BY error on spinta push #535

Closed sirex closed 7 months ago

sirex commented 8 months ago
d | r | b | m | property | type    | ref | source           | level | access
$DATASET                 |         |     |                  |       |
  | db                   | sql     |     | postgresql:///db |       |
  |   |   | City         |         | id  | cities           | 4     |
  |   |   |   | id       | integer |     | id               | 4     | open
  |   |   |   | name     | string  |     | name             | 4     | open

When running spinta push using following manifest, I get this error:

Count rows:   0%|     | 0/1 [00:00<?, ?it/s]
ERROR: Error on _get_row_count({model.name}).
Traceback (most recent call last):
  File "spinta/cli/helpers/data.py", line 55, in count_rows
    count = _get_row_count(context, model)
  File "spinta/cli/helpers/data.py", line 40, in _get_row_count
    for data in stream:
  File "spinta/datasets/backends/sql/commands/read.py", line 97, in getall
    for row in conn.execute(qry):
  File "sqlalchemy/engine/base.py", line 1385, in execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError)
  column "cities.id" must appear in the GROUP BY clause or be used in an
  aggregate function
LINE 1: SELECT count(*) AS count_1, cities.id 
                                    ^

This error appears, only if source database is a non-sqlite database, because sqlite is not strict with GROUP BY. But PostgreSQL is.

It looks, that for some reason, GROUP BY statement is generated for every primary key. This should not be done.

We need to add at least one test, for push, where source is PostgreSQL, to reproduce this error, and avoid such errors in future.

Related