elastic / connectors

Official Elastic connectors for third-party data sources
https://www.elastic.co/guide/en/elasticsearch/reference/master/es-connectors.html
Other
18 stars 136 forks source link

Unnecessary performance penalty when checking list item unique permissions #2407

Open anghelnicolae opened 7 months ago

anghelnicolae commented 7 months ago

The property "HasUniqueRoleAssignments" can be added to the "$select" parameter when getting list items in bulk, instead of doing a request for each item.

https://github.com/elastic/connectors/blob/0d35433d418427a8ee4d991bbc36d5ec3e07665d/connectors/sources/sharepoint_online.py#L1978

seanstory commented 7 months ago

Hi @anghelnicolae - thanks for filing.

In psuedo-code, what it does today is:

for item in site_list_items():
  role_assignments = get_role_assignments(item)

and I think you're suggesting that it could/should be:

for item in site_list_items(include_role_assignments=true)

The problem with that is that fetching site list items is done via the Microsoft Graph API, where fetching role assignments is done via the Sharepoint Rest API. And AFAIK, the Microsoft Graph API does not have the ability to fetch role assignments. You can see the list of available relationships and properties for List Items in the Graph API here: https://learn.microsoft.com/en-us/graph/api/resources/listitem?view=graph-rest-1.0

We use the Graph API as much as possible, because its rate limit quotas are more forgiving, and it's the Microsoft-recommended approach for interacting with Sharepoint where possible.

We are unlikely to use the Sharepoint Rest API to retrieve list items, even if it would have this performance benefit here, because it would decrease performance for all users - even those not using Document Level Security.

If you want to avoid this performance hit, you can choose not to fetch role assignments for the list items with the connector configuration of Fetch unique list item permissions?.

anghelnicolae commented 7 months ago

Hi @seanstory , No, the role assignment would still be retrieved individually for each list item that has "HasUniqueRoleAssignments" true, but the property "HasUniqueRoleAssignments" can be retrieved in bulk. Now, the property is retrieved for each list item:

has_unique_role_assignments = ( await self.client.site_list_item_has_unique_role_assignments( site_web_url, site_list_name, list_item_natural_id ) )

seanstory commented 7 months ago

Perhaps I'm misunderstanding your suggestion still. Can you share a sample curl for what you're suggesting, to fetch hasUniqueRoleAssignments?

anghelnicolae commented 7 months ago

Perhaps I'm misunderstanding your suggestion still. Can you share a sample curl for what you're suggesting, to fetch hasUniqueRoleAssignments?

On line 894 it could be something like this:

select = "createdDateTime,id,lastModifiedDateTime,weburl,createdBy,lastModifiedBy,contentType,HasUniqueRoleAssignments"

seanstory commented 7 months ago

@anghelnicolae no, I don't believe that will work.

Like I said above, the Graph API doesn't have a property or relationship for HasUniqueRoleAssignments. When I try to test your idea in the Graph Explorer, I get a response like:

{
    "error": {
        "code": "BadRequest",
        "message": "Parsing OData Select and Expand failed: Could not find a property named 'HasUniqueRoleAssignments' on type 'microsoft.graph.listItem'.",
        "innerError": {
            "date": "2024-04-18T20:26:17",
            "request-id": "187ddcce-d34f-4d75-8945-5acc1f95807a",
            "client-request-id": "582e2e6d-c48e-87d0-c397-0dc0d6a7b753"
        }
    }
}

as soon as I add HasUniqueRoleAssignments to the $select clause.

anghelnicolae commented 7 months ago

@anghelnicolae no, I don't believe that will work.

Like I said above, the Graph API doesn't have a property or relationship for HasUniqueRoleAssignments. When I try to test your idea in the Graph Explorer, I get a response like:

{
    "error": {
        "code": "BadRequest",
        "message": "Parsing OData Select and Expand failed: Could not find a property named 'HasUniqueRoleAssignments' on type 'microsoft.graph.listItem'.",
        "innerError": {
            "date": "2024-04-18T20:26:17",
            "request-id": "187ddcce-d34f-4d75-8945-5acc1f95807a",
            "client-request-id": "582e2e6d-c48e-87d0-c397-0dc0d6a7b753"
        }
    }
}

as soon as I add HasUniqueRoleAssignments to the $select clause.

If Graph doesn't offer this property it is still better to use the REST API to get all the list items with unique permissions in one request than to test each item individually.

seanstory commented 6 months ago

Yeah, probably. But that's only the case if DLS is enabled.

Perhaps what we could do is change the logic from (psudocode):

for list_item in client.list_items():
  if dls_enabled:
    if client.has_unique_role_assignments(list_item)
      list_item = decorate_with_access_control(list_item, client.get_role_assignments())
  yield list_item

to

if dls_enabled:
  for list_item_with_role_assignments in rest_client.list_items_with_role_assignments():
    yield list_item_with_role_assignments
else:
  for list_item in client.list_items():
    yield list_item

So effectively taking a totally different strategy for fetching list items, depending on if DLS is enabled or not 🤔

I think it's unlikely that we'll prioritize this immediately, but we can leave this open. If you have a support contract with Elastic, you can ask your account rep to file an Enhancement Request to help us understand any urgency on this, and prioritize accordingly. You're also welcome to take a stab at making a PR yourself, if you're not interested in waiting for us to get around to this.