Open gertwieland opened 4 weeks ago
I could reproduce this case after several runs. Trying to catch the duplicate record found this option: Compare using stored row values It seems this solves this case.
Keep diving looking for a repoduction path and here it is:
.take-issue
Apache Hop version?
2.8
Java version?
openjdk version "11.0.21" 2023-10-17
Operating system
Windows
What happened?
"Unique rows (HashSet)" seems to drop records even if they only appear once. Steps to reproduce the error:
Generate 60k records, then add a sequence and one column with random fake data.
Then calculate a SHA256 checksum over it. Since it includes the sequence number from 1 - 60k, those checksums must be all unique.
But still, the "Unique rows (HashSet)" seems to consider one row a duplicate, and only returns 59,999 records.
Test pipeline attached Unique_Hash_Faulty.zip
Issue Priority
Priority: 3
Issue Component
Component: Hop Gui