brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.53k stars 2.27k forks source link

Remove PII and sensitive data from omnibar input to default search provider when search suggestions are enable #40414

Closed simonhong closed 1 week ago

simonhong commented 1 month ago

Description

We should not pass sensitive data to fetch search suggestions This is prerequisite before enabling search suggestions on by default.

Steps to reproduce

  1. Enable search suggestions in brave://settings/search
  2. Check search suggestions is not started for queries that have suspicious/unsafe url
  3. Type 12345678 in omnibox and check no suggestions in omnibox popup (we prevent suggestions for digits longer than 7)
  4. Type 1234567 in omnibox and check some suggestions added in omnibox popup

Actual result

n/a

Expected result

n/a

Reproduces how often

Easily reproduced

Brave version (brave://version info)

All versions

Channel information

Reproducibility

Miscellaneous information

cc @rebron

simonhong commented 1 month ago

Below code is upstream's query checker before sending query to suggest url. If default provider url is used, stoped sending if IsQueryPotentiallyPrivate() is true. If keyword provider url is used, don't care the result of IsQueryPotentiallyPrivate().

Default provider is used when user types inputs at omnibox. Keyword provider is used when user types search keyword(ex, :g or :br) and then type inputs. ex) :br below

Screenshot 2024-08-13 at 3 23 45 PM

@brave/sec-team Would you check below two Qs?

bool SearchProvider::IsQuerySuitableForSuggest(bool* query_is_private) const {
  *query_is_private = IsQueryPotentiallyPrivate();

  // Don't run Suggest in incognito mode, if the engine doesn't support it, or
  // if the user has disabled it.  Also don't send potentially private data
  // to the default search provider.  (It's always okay to send explicit
  // keyword input to a keyword suggest server, if any.)
  const TemplateURL* default_url = providers_.GetDefaultProviderURL();
  const TemplateURL* keyword_url = providers_.GetKeywordProviderURL();
  return !client()->IsOffTheRecord() && client()->SearchSuggestEnabled() &&
         ((default_url && !default_url->suggestions_url().empty() &&
           !*query_is_private) ||
          (keyword_url && !keyword_url->suggestions_url().empty()));
}

bool SearchProvider::IsQueryPotentiallyPrivate() const {
  if (input_.text().empty())
    return false;

  // Check the scheme.  If this is UNKNOWN/URL with a scheme that isn't
  // http/https/ftp, we shouldn't send it.  Sending things like file: and data:
  // is both a waste of time and a disclosure of potentially private, local
  // data.  Other "schemes" may actually be usernames, and we don't want to send
  // passwords.  If the scheme is OK, we still need to check other cases below.
  // If this is QUERY, then the presence of these schemes means the user
  // explicitly typed one, and thus this is probably a URL that's being entered
  // and happens to currently be invalid -- in which case we again want to run
  // our checks below.  Other QUERY cases are less likely to be URLs and thus we
  // assume we're OK.
  if (!base::EqualsCaseInsensitiveASCII(input_.scheme(), url::kHttpScheme) &&
      !base::EqualsCaseInsensitiveASCII(input_.scheme(), url::kHttpsScheme) &&
      !base::EqualsCaseInsensitiveASCII(input_.scheme(), url::kFtpScheme))
    return (input_.type() != metrics::OmniboxInputType::QUERY);

  // Don't send URLs with usernames, queries or refs.  Some of these are
  // private, and the Suggest server is unlikely to have any useful results
  // for any of them.  Also don't send URLs with ports, as we may initially
  // think that a username + password is a host + port (and we don't want to
  // send usernames/passwords), and even if the port really is a port, the
  // server is once again unlikely to have and useful results.
  // Note that we only block based on refs if the input is URL-typed, as search
  // queries can legitimately have #s in them which the URL parser
  // overaggressively categorizes as a url with a ref.
  const url::Parsed& parts = input_.parts();
  if (parts.username.is_nonempty() || parts.port.is_nonempty() ||
      parts.query.is_nonempty() ||
      (parts.ref.is_nonempty() &&
       (input_.type() == metrics::OmniboxInputType::URL)))
    return true;

  // Don't send anything for https except the hostname.  Hostnames are OK
  // because they are visible when the TCP connection is established, but the
  // specific path may reveal private information.
  if (base::EqualsCaseInsensitiveASCII(input_.scheme(), url::kHttpsScheme) &&
      parts.path.is_nonempty())
    return true;

  return false;
}
fmarier commented 1 month ago

I think it's fine to skip the check when users explicitly state that they want to send the input to a search engine (by using :g or :br for example).

simonhong commented 1 month ago

I think it's fine to skip the check when users explicitly state that they want to send the input to a search engine (by using :g or :br for example).

Thanks for checking! PTAL second question when available 👍🏼

fmarier commented 3 weeks ago

Is IsQueryPotentiallyPrivate() sufficient for our criteria for making search suggestion default on?

I reviewed the set of security requirements that were agreed to and it looks like IsQueryPotentiallyPrivate() is missing the following heuristics from Search/WDP:

@diracdeltas Do you think these are blockers for https://github.com/brave/brave-core/pull/25161?

fmarier commented 3 weeks ago

I have talked to Yan and she agreed that we should have parity with Search in terms of excluding private queries before rolling this out.

simonhong commented 3 weeks ago

I have talked to Yan and she agreed that we should have parity with Search in terms of excluding private queries before rolling this out.

@fmarier really thanks for checking!

kjozwiak commented 1 week ago

The above requires 1.70.103 or higher for 1.70.x verification 👍

GeetaSarvadnya commented 1 day ago

Verification PASSED on

Brave | 1.70.107 Chromium: 128.0.6613.120 (Official Build) beta (64-bit)
-- | --
Revision | ab3f504ca4a15c330f60a93d5e3773d780498980
OS | Windows 10 Version 22H2 (Build 19045.4780)

Suspicious websites

Search suggestion enabled Example Example Example Example Example Example
image image image image image image image

Secure/genuine websites:

Search suggestion enabled Example Example Example Example Example Example
image image image image image image image