Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform
2 stars 0 forks source link

Soften restrictions for implicit range functions #2220

Open sebastian opened 6 years ago

sebastian commented 6 years ago

This is just a note to remember to consider || too when looking through what can be softened and what not.

For example || inside extract_words.

See: https://github.com/Aircloak/aircloak/pull/2216#issuecomment-351699111

The original query we gave them to use was:

SELECT
  year(buchungsDatum) as year,
  month(buchungsDatum) as month,
 EXTRACT_WORDS(
    UPPER(buchungstext || verwendungszweck || name)
  ) as card,
  count(*),
  count_noise(*)
FROM umsatz
WHERE EXTRACT_WORDS(
  UPPER(buchungstext || verwendungszweck || name)
) IN ('EC', 'Visa', 'Master', 'Maestro', 'American')
GROUP BY card, year, month
ORDER BY year, month, count(*) DESC
sebastian commented 6 years ago

@yoid2000, just wanted to put this on your radar too.

yoid2000 commented 6 years ago

That is concatenate, right? Currently we have a rule that concat is treated same as positive equality.

But I guess that doesn't make sense in this case, because in any event extract_words negates any positive equality condition effect. By the way, why don't you need to insert spaces between the concatenated words? Wouldn't this query merge them together into composite words? In which case, now I think of it, the positive AND'd equality effect of the concatenations would after all exist.

obrok commented 6 years ago

Currently we have a rule that concat is treated same as positive equality.

What do you mean by this?

yoid2000 commented 6 years ago

I mean that we add the same noise layers for the following two queries:

SELECT concat(cast(cli_district_id, text), card_type), count(*), count_noise(*)
FROM cards
GROUP BY 1
SELECT cli_district_id, card_type, count(*), count_noise(*)
FROM cards
GROUP BY 1,2
obrok commented 6 years ago

Hm, yeah, I guess... I'd say we treat concat just like any other function, not like an equality. Thought for a minute you're thinking of something we're totally not doing.

yoid2000 commented 6 years ago

I ran the above queries and the results, for the 5-6 rows I checked, were identical. So I supposed that are treating concat like AND'd positive equalities...

obrok commented 6 years ago

So I supposed that are treating concat like AND'd positive equalities...

In a sense... However we also treat + like a positive AND'd equality. And you can do lower(a || b) , which you cannot do with an equality. So in that sense I don't think we're treating it as an AND'd equality.

I don't think it really matters, just wanted to make sure you're not under the impression that we have some mechanism in place while we don't.

obrok commented 6 years ago

@sebastian I'm not quite sure what should be done for this issue

sebastian commented 6 years ago

This requires @yoid2000's input too. Basically someone needs to sit down and think through which functions can safely be used in combination with implicit ranges.