NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
783 stars 228 forks source link

[FEA]Support function sha1 #7035

Open viadea opened 1 year ago

viadea commented 1 year ago

I wish we can support function sha1.

eg:

spark-sql> select sha1(c_customer_id) from customer limit 10;

      ! <Sha1> sha1(cast(c_customer_id#25 as binary)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.Sha1
sameerz commented 1 year ago

Related cudf issue https://github.com/rapidsai/cudf/issues/8641

revans2 commented 1 year ago

We need to be a bit careful here to be sure that our requirements match that of pandas/python. In Spark the output of all sha hashes is a string that is a lowercase HEX encoded representation of the binary hash result. This corresponds to the hexdigest() method. Also https://github.com/rapidsai/cudf/issues/8641 calls out sha256 and sha512, not sha1. Spark supports sha1 and sha2. sha2 supports bit lengths of 224, 256, 384, and 512. sha2 matches sha256 and sha512.

sameerz commented 1 year ago

The PR https://github.com/rapidsai/cudf/pull/9215 attached to https://github.com/rapidsai/cudf/issues/8641 appears to be adding support for SHA-1.

sameerz commented 7 months ago

Depends on https://github.com/rapidsai/cudf/pull/14391