Open bobbai00 opened 6 months ago
I have two Python UDFs, their columns are: 1st one:
2nd one:
Before I connected them to the HashJoin, their outputs are:
After I connected them directly to the HashJoin(inner join on state_num of integer type):
state_num
the tuple mapping is 0 -> 0 for the HashJoin operator, and one of the PythonUDF tuple mapping becomes 50->25 instead of 50->50.
And if I introduced two type-cast operators, convert the state_num column from type integer to string
the HashJoin has the pair 50->0
50->0
When changing the join type from inner join to full outerjoin, the pair becomes 50->50, but the result is not correct:
inner join
full outerjoin
50->50
The workflow link is: https://texera.ics.uci.edu/workflow/1642
@bobbai00 can you test it again on the current master, to see if we still have this issue?
I have two Python UDFs, their columns are: 1st one:
2nd one:
Before I connected them to the HashJoin, their outputs are:
After I connected them directly to the HashJoin(inner join on
state_num
of integer type):the tuple mapping is 0 -> 0 for the HashJoin operator, and one of the PythonUDF tuple mapping becomes 50->25 instead of 50->50.
And if I introduced two type-cast operators, convert the
state_num
column from type integer to stringthe HashJoin has the pair
50->0
When changing the join type from
inner join
tofull outerjoin
, the pair becomes50->50
, but the result is not correct:The workflow link is: https://texera.ics.uci.edu/workflow/1642