Closed erezsh closed 3 months ago
Hey, so I tried the PR, and I see something that is happening that can be a little bit messy
So using a JOINDIFF and materializing to a table as such :
for sign, row in diff_tables(
table1,
table2,
algorithm=Algorithm.JOINDIFF,
materialize_to_table="qa.test",
materialize_all_rows=True,
threaded=True,
max_threadpool_size=8,
validate_unique_key=False,
allow_empty_tables=True,
):
print(f"{sign} {row}")
if sign == "-":
missing_count += 1
elif sign == "+":
new_count += 1
it happens that the call to query_key_range
in the _bisect_and_diff_tables
method is "randomly" failing.
Following this path, I came across the fact that query_key_range
raise a EmptyTable
exception when the table is empty and does not give any more results.
So pretty much when
# Start with the first completed value, so we don't waste time waiting
min_key1, max_key1 = self._parse_key_range_result(key_types1, next(key_ranges))
Is being runned, depending on which table the query_key_range
completed first (empty table or not empty table) it will either:
Is there something we can do for that ?
@Komalis Thanks for testing it!
I realize my implementation was too naive (in more than one way), and I'm now in the process of rewriting it to use the normal diff code, so things like materialize table etc. should work as before.
It should be ready soon, I'll let you know.
@Komalis Can you try with this new version?
Will do!
@Komalis Even though it's already merged - if you find the time to test it, let me know how it went!
Fixes issue #21
Use
--allow-empty-tables
orallow_empty_tables=True