erezsh / sqeleton

A python library for efficiently interacting and querying SQL databases
MIT License
20 stars 4 forks source link

If one table is empty, it gives up an error #16

Closed Komalis closed 4 months ago

Komalis commented 4 months ago

Describe the bug A clear and concise description of what the bug is.

The big is the following :

When one of the table to compare is empty or it gives an empty result using where="xxx"

ValueError: Table ('schema', 'table') is empty.

Is this normal behavior ?

Describe the environment

Describe which OS you're using, which sqeleton version, and any other information that might be relevant to this bug.

erezsh commented 4 months ago

What code are you running?

Komalis commented 4 months ago

Hello, sorry for the no explanation issue that I gave, here is a quick example to understand what I meant

CREATE TABLE public.toast_one (
    id INTEGER,
    name TEXT
);

CREATE TABLE public.toast_two (
    id INTEGER,
    name TEXT
);

INSERT INTO public.toast_one (
    id,
    name
)
VALUES
(1, 'Hello');

Then

reladiff -k id -s -j4 postgresql://xxxx:xxxxx@xxxx:xxxx/xxxx public.toast_one postgresql://xxxx:xxxxx@xxxx:xxxx/xxxx public.toast_two --verbose
[10:43:40] INFO - [PostgreSQL] Starting a threadpool, size=4.
[10:43:41] INFO - Diffing schemas...
[10:43:41] INFO - Diffing using columns: key=('id',) update=None extra=().
[10:43:41] INFO - Using algorithm 'joindiff'.
[10:43:41] INFO - Validating that the are no duplicate keys in columns: ['id']
[10:43:41] INFO - Diffing segments at key-range: (1)..(2). size: table1 <= 1, table2 <= 1
[10:43:41] INFO - . Diffing segment 1/1, key-range: (1)..(2), size <= None
[10:43:42] INFO - Validating that the are no duplicate keys in columns: ['id']
[10:43:42] ERROR - Table appears to be empty

or with a python script

from reladiff import Algorithm, connect_to_table, diff_tables

table1 = connect_to_table(
    "xxxx",
    "public.toast_one",
    ("id",),
    thread_count=8,
)
table2 = connect_to_table(
    "xxxxx",
    "public.toast_two",
    ("id",),
    thread_count=8,
)

missing_count = 0
new_count = 0

for sign, row in diff_tables(
    table1,
    table2,
    algorithm=Algorithm.JOINDIFF,
    threaded=True,
    max_threadpool_size=8,
    validate_unique_key=False,
):
    print(f"{sign} {row}")
    if sign == "-":
        missing_count += 1
    elif sign == "+":
        new_count += 1

Traceback (most recent call last):
  File "/home/xxxxxxxx/reladiff/main.py", line 19, in <module>
    for sign, row in diff_tables(
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/diff_tables.py", line 98, in __iter__
    for i in self.diff:
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/diff_tables.py", line 183, in _diff_tables_wrapper
    yield from self._diff_tables_root(table1, table2, info_tree)
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/joindiff_tables.py", line 161, in _diff_tables_root
    yield from self._bisect_and_diff_tables(table1, table2, info_tree)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/diff_tables.py", line 253, in _bisect_and_diff_tables
    min_key2, max_key2 = self._parse_key_range_result(key_types1, next(key_ranges))
                                                                  ^^^^^^^^^^^^^^^^
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/diff_tables.py", line 65, in _thread_as_completed
    yield future.result()
          ^^^^^^^^^^^^^^^
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxxxxxx/.local/share/mise/installs/python/3.11.9/lib/python3.11/site-packages/reladiff/table_segment.py", line 279, in query_key_range
    raise ValueError("Table appears to be empty")
ValueError: Table appears to be empty

The behaviour that I was expecting is that reladiff tells me that the diff is

- (1, "Hello")

or

+ (1, "Hello")

Depending on which table we are doing the compare.

Is it something that was thought to be done like that ?

Komalis commented 4 months ago

This was meant to be an issue of reladiff, I am very sorry :/

erezsh commented 4 months ago

I understand. I'll close it here, and you can re-open it in reladiff, where we can continue the discussion.

I do think this is the intended behavior of reladiff, because empty tables usually mean the query was incorrect, and we don't want to "fail silently".

But I also understand sometimes it's valid to diff against an empty table, so perhaps I should change reladiff to allow it, and only issue a warning instead of an error. Or perhaps I can add a switch like --allow-empty.

Do you need to diff against empty tables in a real use-case, or is it just for tests?

Komalis commented 4 months ago
Do you need to diff against empty tables in a real use-case, or is it just for tests?

Yes i need to diff against empty tables in a real use case, it's even more useful because sometimes I use the "where" argument, which gives make an empty result, and it is expected.