Closed Jackhjf closed 1 year ago
Hello @Jackhjf, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.
Duplicated known issue as in https://github.com/apache/kyuubi/issues/4202 when applying column masking on DataSourceV2Relation. Suggested ut and fix in https://github.com/apache/kyuubi/pull/4304, but it's in pending status. More work is required to do.
Fixed in https://github.com/apache/kyuubi/pull/4358. Please have a check. @Jackhjf
Fixed in #4358. Please have a check. @Jackhjf
Thanks for your work, I can get the correct result with the following use case; @bowenliang123 @yaooqinn
create table yhbi.test_copy(id string,name string) using iceberg ; create table yhbi.test_copy1(id1 string,name1 string) using iceberg ; create table yhbi.test1(id string,name string) using iceberg(name is the data masking field; create table iceberg_spark.test_copy(id string,name string) using iceberg (name is the data masking field);
001
insert into yhbi.test_copy select *from iceberg_spark.test_copy
002
insert into yhbi.test_copy1 select *from iceberg_spark.test_copy
003
insert overwrite yhbi.test_copy1 select id,name as name1 from iceberg_spark.test_copy
004
insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy a where a.id in ("111111","11111");
005
insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy order by id, name;
006
select max(name) from iceberg_spark.test_copy; select min(name) from iceberg_spark.test_copy;
007
insert overwrite yhbi.test_copy1 select id ,count(name) from iceberg_spark.test_copy group by id; insert overwrite yhbi.test_copy1 select count(id) ,name from iceberg_spark.test_copy group by name;
008
insert overwrite yhbi.test_copy1 select b.id,b.name from iceberg_spark.test_copy a join iceberg_spark.test1 b on a.id=b.id;
009 insert overwrite yhbi.test_copy1 select id,name as name1 from iceberg_spark.test_copy union select *from (select a.id ,a.name from iceberg_spark.test1 a join iceberg_spark.test_copy b on a.id=b.id) c;
thanks @Jackhjf for the validation
thanks @Jackhjf for the validation now i found a problem,if I excute this sql: select id ,name from iceberg_spark.test_copy a where a.name like "中%"; I will get an empty result
@yaooqinn
thanks @Jackhjf for the validation now i found a problem,if I excute this sql: select id ,name from iceberg_spark.test_copy a where a.name like "中%"; I will get an empty result
@yaooqinn
I think this is the exact result that you would expect with the column masking. The masking is properly pushed down to leaf node of scanning the source. And in this case, it prevents reading raw data without masking rules.
You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'
. Users get the result md5('Jackhjf')
, but everyone knows the datasource contains records for Jackhjf
. In such a case, the data is not secured, right?
You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to
select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'
. Users get the resultmd5('Jackhjf')
, but everyone knows the datasource contains records forJackhjf
. In such a case, the data is not secured, right?
From this point of view, the data is really not safe.thanks
Code of Conduct
Search before asking
Describe the bug
Configure the mask policy on the ranger, and when executing sql through Spark, an error is reported
Affects Version(s)
master
Kyuubi Server Log Output
Additional context
No response
Are you willing to submit PR?