elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
71 stars 126 forks source link

lowercase ACL record IDs #2775

Open seanstory opened 1 month ago

seanstory commented 1 month ago

Problem Description

For Document Level Security, connectors produce ACL documents with _id values that should make semantic sense. For instance, in the Sharepoint Online connector, an email address may be used. However, the casing between the 3rd-party and the authenticating user may not be exactly the same. USER@acme.com, User@acme.com, and user@acme.com are all the "same" email address, but for _id fields in Elasticsearch, all are considered distinct.

To make things easier, Connectors should produce lower-case ACL IDs, so the operator always knows what casing to look these records up with.

Proposed Solution

Connectors should lowercase the _id fields for ACL documents.

Alternatives

A customer might think that a Pipeline could be used to do this lowercasing. However, this will interfere with the connector logic that attempts to delete stale ACL records. An illustrative scenario:

  1. We run a sync. It fetches 3 docs, with IDs: [FOO, BAR, BAZ]
  2. They have a pipeline that lowercases those. So Elasticsearch stores them as [foo, bar, baz]
  3. We run another sync. It fetches 3 docs, with IDS: [FOO, BAZ, QUX]. It also looks to see what IDs are already in Elasticsearch, and finds [foo, bar, baz].
  4. It indexes the 3 docs it found. The pipeline applies again, and stores them as [foo, baz , qux]
  5. Now it deletes the IDs that it had found in ES that it didn't find during the sync. That's all 3 IDs it found from ES: [foo, bar, baz]
  6. All that's left in the index is qux