Closed nhanitvn closed 8 years ago
I don't really understand the fix, why this involves a sub query? Does it affect the performance of sample? Can we just disallow negative seed or deprecate it and workaround by shifting / getting abs value instead and create ticket for BA to migrate it?
On another thought, what does this seed
do? Does it guarantee that we will get the same result if we use the same seed? If not can we just get its abs value instead?
You cannot run this in SparkSQL "select * from table order by rand(-123)" while these sql work
We're fixing that by introducing a subquery, it will affect performance right? Also, my opinion is negative seed doesn't make sense, so we can just:
If negative seed really makes sense, then I think do an if check and only on sub query when seed is negative is better
I did a quick benchmark and the subquery does increase the running time (add 1/2x to x where x is the running time of the sql using no subquery). So I refactor to handle the negative seed case separately. Performing an abs() is not a good one, I think, though it is not that critical.
Description and related tickets, documents
due to the fact that SparkSQL does not allow rand() to accept a negative seed value in order by clause
Reviewers: @hai-adatao
Breaking changes & backward compatible issues
How to test
Describe how this PR is tested. In case manual testing is required, describe how to do so.
PR Progress
Make sure all checkboxes below are checked before merged
prefix/description
(see this)Change-Id: I34ebed02c4ff615aae99a30cd68c3e1104ee3dc0