atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

Pagination infinite loop error #542

Closed sirex closed 5 months ago

sirex commented 7 months ago

I have following data:

CREATE TABLE cities (
    id integer primary key,
    name text
);
INSERT INTO cities (id, name) VALUES (0, 'Vilnius');

And a manifest:

d | r | b | m | property | type    | ref      | source     | level | access
example                  |         |          |            |       |
  | db                   | sql     | external |            |       |
  |   |   | City         |         | id       | cities     | 4     |
  |   |   |   | id       | integer |          | id         | 4     | open
  |   |   |   | name     | string  |          | name       | 4     | open

When I run Spinta in external mode and try to read City data:

spinta run --mode=external sdsa.txt
http get :8000/example/City

I get following error:

Traceback (most recent call last):
  File "spinta/utils/response.py", line 246, in aiter
    for data in stream:
  File "spinta/utils/response.py", line 239, in _iter
    for data in itertools.chain(peek, stream):
  File "spinta/formats/json/components.py", line 19, in __call__
    for i, row in enumerate(data):
  File "spinta/accesslog/__init__.py", line 162, in log_response
    for row in rows:
  File "spinta/commands/read.py", line 129, in <genexpr>
    rows = (
  File "spinta/commands/read.py", line 238, in get_page
    raise InfiniteLoopWithPagination()
spinta.exceptions.InfiniteLoopWithPagination: Pagination values has cause infinite loop while fetching data.

As I understand, this is related with cities.id set to 0?

Related

sirex commented 7 months ago

It looks, that this error reappears in another place:

Traceback (most recent call last):
  File "spinta/cli/push.py", line 460, in _push_rows
    next(rows)
  File "spinta/cli/push.py", line 1353, in _save_push_state
    for row in rows:
  File "spinta/cli/push.py", line 919, in _push_to_remote_spinta
    for row in rows:
  File "spinta/cli/push.py", line 881, in _prepare_rows_for_push
    for row in rows:
  File "spinta/cli/push.py", line 1306, in _check_push_state
    for model_type, group in itertools.groupby(rows, key=_get_model_type):
  File "spinta/cli/push.py", line 408, in _read_rows
    yield from _get_model_rows(
  File "spinta/cli/push.py", line 526, in _get_model_rows
    for row in rows:
  File "spinta/cli/push.py", line 564, in _iter_model_rows
    for row in rows:
  File "spinta/cli/push.py", line 727, in _read_rows_by_pages
    state_row = next(state_rows, None)
  File "spinta/cli/push.py", line 868, in _get_state_rows
    raise InfiniteLoopWithPagination()
spinta.exceptions.InfiniteLoopWithPagination: Pagination values has cause infinite loop while fetching data.

I think, this might be related with push state. If we have a previous push state, after upgrade, this error appears. This is a second time I get this report. After clearing push state, error disappears.

Manifest, that triggered this error:

https://github.com/atviriduomenys/manifest/blob/master/datasets/gov/vlk/paslaugos.csv

sirex commented 5 months ago

We need better error mesages related with pagination.

All error messages should return standard error message:

{
    "errors": [
        {
            "type": "page"
            "code": "...PageKey",
            "context": {
                "schema": "4"
                "dataset": "datasets/gov/example",
                "model": "datasets/gov/example/City",
                "property": "_page",
                "id": "uuid...",
                "page": {
                    "key": "...",
                    "size": "...",
                },
            },
            "message": "...",
            "template": "...",
        }
    ]
}

InvalidPageKey

Raised, when page key contains NULL and if NULL handling is not supported in queries.

UnsortedPageKey

Raised if page key is selected in in specified order, it is ok if we encounter same page key value several times.

For exampe when we have a query, that returns following page.key values:

1, 5, 3, 6

When we loop over 3, UnsortedPageKey error should be raised, because page key 3 is lower than 5.

Key in mind, that page key sort direction can be specifiend, so depending on page key direction, we check either ascendign or descending order.

Page key, can contain duplicate values, for example, this should not raise error:

1, 1, 1, 2, 3

TooShortPageSize

Raised, when page key for first and last rows are the same.

Example 1

For example if we have following sequence of page keys:

1, 2, 2, 3, 3, 4

And page size set to 2, then we should return following pages:

1,
2, 2
3, 3
4

If duplicate page keys appears at the end of page, we have to move objects of this last dublicate page key, to the next page.

This could be implemented, by selecting page.size + 1 rows, if last two page keys are the same, then we have to move all objects with duplicate page keys at the end of result set to the next page.

Example 2

If we have followin page key sequence:

1, 2, 2, 2, 3, 4

Then we should raise TooShortPageSize error on secodn page, because for the second page we will do:

WHERE key > last_page_key
LIMIT page.size + 1

Where last_page_key is 1 and page.size is 2. This query will return:

2, 2, 2

Here we compare, that first and last page keys are equal, and that means, this page key sequence does not fit into one page and we raise TooShortPageSize, with information, how to increate page size.