CarnationWang23 / hyracks

Automatically exported from code.google.com/p/hyracks
Apache License 2.0
0 stars 0 forks source link

Tree bulk load needs to check duplicate keys #103

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Currently tree bulkload operator does not check duplicate keys.
A BTree can be loaded even there are duplicate keys in the input.

Original issue reported on code.google.com by buyingyi@gmail.com on 16 Apr 2013 at 6:54

GoogleCodeExporter commented 8 years ago
It's found by a use case in Genomix.  Should be an easy fix (finding duplicates 
after sorting the records in bulk load)?

Original comment by che...@gmail.com on 16 Apr 2013 at 7:10

GoogleCodeExporter commented 8 years ago
We have this already. The bulk load operator has a parameter called 
"verifyInput" which will check that tuple i+1 is strictly greater than tuple i. 
This means that any duplicates will throw an exception.

Original comment by zheilb...@gmail.com on 16 Apr 2013 at 7:23

GoogleCodeExporter commented 8 years ago
If your input is sorted (and it should be), then as Zack said, by passing true 
for verifyInput, the bulk load will throw an exception when it detects a 
duplicate. Can you double check:
1) the input stream is sorted.
2) the verifyInput parameter, that is passed to the bulkload operator, is set 
to true

Original comment by salsuba...@gmail.com on 16 Apr 2013 at 7:50

GoogleCodeExporter commented 8 years ago
Ok, sounds good. Let me close this issue.

Original comment by buyingyi@gmail.com on 16 Apr 2013 at 7:52