erezsh / reladiff

High-performance diffing of large datasets across databases
https://reladiff.readthedocs.io/en/latest/index.html#
Other
366 stars 9 forks source link

Allow diffing on empty tables #22

Closed erezsh closed 3 months ago

erezsh commented 4 months ago

Fixes issue #21

Use --allow-empty-tables or allow_empty_tables=True

Komalis commented 4 months ago

Hey, so I tried the PR, and I see something that is happening that can be a little bit messy

So using a JOINDIFF and materializing to a table as such :

for sign, row in diff_tables(
    table1,
    table2,
    algorithm=Algorithm.JOINDIFF,
    materialize_to_table="qa.test",
    materialize_all_rows=True,
    threaded=True,
    max_threadpool_size=8,
    validate_unique_key=False,
    allow_empty_tables=True,
):
    print(f"{sign} {row}")
    if sign == "-":
        missing_count += 1
    elif sign == "+":
        new_count += 1

it happens that the call to query_key_range in the _bisect_and_diff_tables method is "randomly" failing. Following this path, I came across the fact that query_key_range raise a EmptyTable exception when the table is empty and does not give any more results.

So pretty much when

# Start with the first completed value, so we don't waste time waiting
min_key1, max_key1 = self._parse_key_range_result(key_types1, next(key_ranges))

Is being runned, depending on which table the query_key_range completed first (empty table or not empty table) it will either:

Is there something we can do for that ?

erezsh commented 4 months ago

@Komalis Thanks for testing it!

I realize my implementation was too naive (in more than one way), and I'm now in the process of rewriting it to use the normal diff code, so things like materialize table etc. should work as before.

It should be ready soon, I'll let you know.

erezsh commented 4 months ago

@Komalis Can you try with this new version?

Komalis commented 4 months ago

Will do!

erezsh commented 3 months ago

@Komalis Even though it's already merged - if you find the time to test it, let me know how it went!