Closed seanstory closed 1 month ago
Oh wow, amazing job on triaging, Sean!
@moxarth-elastic It would be good to spawn off a sub-ticket that focuses only on Server when you'll pick this up. Thanks!
Hi Team,
The PR for adding a support for Host Named Site collection in the Sharepoint Server is merged.
For Sharepoint Online, we did some research on How to create Host Named Site collections and as per the documentation, it shows there is no support of Host Named Site collections for Sharepoint in Microsoft 365
and seems there is no concept of Host Named Site Collection
in SPO.
Hence, closing this ticket with the fix in Sharepoint Server connector.
Describe the bug
In Sharepoint (Online or Server), sites can be logically grouped in "Site Collections," where there is a root site, and then a bunch of child sites. Thus far, we've only seen setups where there is a single Site Collection at
<tenant-name>.sharepoint.com/sites/
. And this seems to have lead us to assume that thetenant-name
is tightly coupled to the hostname for all sites/site collections. However, from https://learn.microsoft.com/en-us/sharepoint/sites/sites-and-site-collections-overview:While it may not be recommended, it does seem that it's valid to have non-path-based site collections, which can mean that the hostname for a given site collection is NOT prefixed with the
tenant-name
. That makes checks like these behave incorrectly.https://github.com/elastic/connectors/blob/9fcdf5e308c9657e092116a6a6568002c32d6a47/connectors/sources/sharepoint_online.py#L1020-L1025
Further, it can mean that we attempt to sync (and fail to sync) some sites on site collections we didn't intend to.
For example - let's say a user has one tennant
acmecorp
. They have two site collections:acmecorp.sharepoint.com/sites/
acmecorpb.sharepoint.com/sites/
Further, both of those site collections have a site with path/foo
In our connector, if you configured
tenant_name: acmecorp
andsites: foo
andenumerate_all_sites: false
, we would successfully fetchacmecorp.sharepoint.com/sites/foo
, but then we'd also try to fetchacmecorpb.sharepoint.com/sites/foo
, decide that we were trying to fetch something from another tenant, then fail the sync. And there would be no way for us to successfully fetch onlyacmecorpb.sharepoint.com/sites/foo
because its hostname will never align with its tenant name.To Reproduce
Expected behavior
Environment
8.13.0-SNAPSHOT and before
Additional context
slack thread: https://elastic.slack.com/archives/C7LLL50CA/p1705426484605769