facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.2k stars 1.06k forks source link

Add support for VARBINARY input to approx_distinct Presto aggregate function #9834

Closed zhli1142015 closed 2 days ago

netlify[bot] commented 2 weeks ago

Deploy Preview for meta-velox canceled.

Name Link
Latest commit ab3d32258b583be125699cf43bcaf518e3a8b7b9
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6657d216c8dd8600078a7821
zhli1142015 commented 2 weeks ago

@mbasmanova and @Yuhta , could you help to take a look this change? Thanks. Here is the background for this PR: We map Spark agg function HyperLogLogPlusPlus to this func in Gluten. By Spark UT: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala#L224C19-L224C29, we found this function is not registered for varbinary type.

mbasmanova commented 2 weeks ago

@zhli1142015 CI is red. Please, take a look.

zhli1142015 commented 2 weeks ago

I think both failures are not related to this PR. Issues created: https://github.com/facebookincubator/velox/issues/9839 and https://github.com/facebookincubator/velox/issues/9840 .

E20240516 02:21:43.796859  1860 Exceptions.h:69] Line: /__w/velox/velox/velox/velox/exec/DistinctAggregations.cpp:262, Function:create, Expression:  Unexpected type HUGEINT, Source: RUNTIME, ErrorCode: UNREACHABLE_CODE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: UNREACHABLE_CODE
Reason: Unexpected type HUGEINT
Retriable: False
Function: create
File: /__w/velox/velox/velox/velox/exec/DistinctAggregations.cpp
Line: 262
[ RUN      ] PeriodicStatsReporterTest.basic
/__w/velox/velox/velox/common/base/tests/StatsReporterTest.cpp:417: Failure
Expected equality of these values:
  counterMap.count(kMetricMemoryCacheNumEmptyEntries.str())
    Which is: 0
  1
E20240516 09:24:41.018837  9072 ThreadedRepeatingFunctionRunner.cpp:31] ThreadedRepeatingFunctionRunner::stop() should already have been called, since we are now in the Runner's destructor. This is because it means that its threads may be accessing object state that was already destroyed -- e.g. members that were declared after the ThreadedRepeatingFunctionRunner.
*** Aborted at 1715851481 (Unix time, try 'date -d @1715851481') ***
*** Signal 11 (SIGSEGV) (0x7f772b0d0014) received by PID 9072 (pthread TID 0x7f7164d7e340) (linux TID 9072) (code: address not mapped to object), stack trace: ***
(error retrieving stack trace)
jinchengchenghh commented 2 weeks ago

Can you add a test?

zhli1142015 commented 1 week ago

Can you add a test?

Added UT, thanks.

zhli1142015 commented 1 week ago

@zhli1142015 CI is red. Please, take a look.

CI passed, @mbasmanova , could you help review? Thanks.

zhli1142015 commented 4 days ago

Kindly ping, @mbasmanova could you help to review? Thanks.

facebook-github-bot commented 2 days ago

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 days ago

@Yuhta merged this pull request in facebookincubator/velox@b00c16ad9e2faf57f6ab6ffec0cf4278646594d8.

conbench-facebook[bot] commented 2 days ago

Conbench analyzed the 1 benchmark run on commit b00c16ad.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.