elastic / connectors

Official Elastic connectors for third-party data sources
https://www.elastic.co/guide/en/elasticsearch/reference/master/es-connectors.html
Other
21 stars 138 forks source link

[SharePoint] Extend Metadata Indexing to custom one #644

Open danajuratoni opened 1 year ago

danajuratoni commented 1 year ago

Issue created in another repo, replicated here for visibility: https://github.com/elastic/connectors-ruby/issues/499

Description

In https://github.com/elastic/connectors-python/issues/1268, we removed many metadata fields from what we index for Sharepoint Online. We did this by adding explicit $select clauses to our queries to the Graph API, which tells the API which fields to send in the response.

However, we anticipate that many customers will want fine-grained control over what fields they fetch and index, but will also not want to make code changes in order to fetch more/different fields.

danajuratoni commented 1 year ago

cc: @seanstory we could repurpose this document for indexing custom metadata in a configurable manner

seanstory commented 1 year ago

CC @JoseLuisGJ - we'll want some UX insight on a good way to make metadata field selection configurable.

Easiest way (I think) would be new RCFs with comma-separated field names that we allow-list (default values are the fields we chose in 8.9). But that will look gross fast - each document "type" (list, listItem, sitePage, driveItem, site, listItemAttachment, etc) has a different list of fields. So that's a lot of RCFs.

Another approach could be to just solve this with Advanced Sync Rules - letting the customer specify the exact $select clauses they want for each resource type. But that might be hard to maintain if we decide that certain fields are required, and they are not specified in such a sync rule.

danajuratoni commented 1 year ago

@daveyholler this is highly awaited enhancement, could we get some design input on our best options here before starting implementation?

daveyholler commented 1 year ago

@danajuratoni happy to help. Can I get a demo, @seanstory on what this looks like in practice? I'm struggling to visualize this in my head.

seanstory commented 1 year ago

Sure thing. Dropped time on the cal for tomorrow.

danajuratoni commented 1 year ago

@daveyholler is there any design deliverable planned? @seanstory what are the takeaways from the meeting?

daveyholler commented 1 year ago

@danajuratoni I've got some updates/clarification questions that I'll write up tomorrow morning.

daveyholler commented 1 year ago

@danajuratoni


After chatting with Sean, I think that there’s some things here we should dig into a little:

  1. How does a user know which fields are available for them to enter? — It sounds like there’s quite a bit of variation in which fields are present in each connector type. Do users have the ability to see all their field names on a given SharePoint (or other) connector? Is there a way we can sample documents to provide them with a finite list of field options? Or is getting this information more of a back and forth between individuals (roles) within the user’s organization?

  2. What kind of validation are we able to offer after a user specifies field names?

  3. If the fieldnames are “arbitrary” (as far as we’re concerned), and a user can’t validate what they’ve entered, and/or if the process of actually providing those field names is more challenging than selecting from a drop down list, how many users do you anticipate will use the feature in the UI?

  4. And lastly, is this something that we can/should provide via API rather than adding steps/options to the UI?

DianaJourdan commented 1 year ago

@danajuratoni as there is no design yet and discussions are still needed, this one can't make 8.10 anymore

danajuratoni commented 1 year ago

I have the feeling we're mixing Rich Configurable Fields(RCF) with I'll call Advanced Filtering Fields.

How does a user know which fields are available for them to enter

RCF are fields each connector sends to Kibana and show up as editable in the Configuration tab. We aim to make these as "rich" and user friendly as possible with placeholders / validations / dropdowns / selection options where possible. These would likely reside in the config.yaml some day. All connectors require at least one RFC to connect to the data source. Additional fields might be added for other functionality such as extraction capabilities or reducing the ingest scope to e.g. a certain table / project / space. These fields for reducing the data corpus to be ingested have a certain overlap with Advanced Filtering Fields. However, Advanced Filtering Fields are fields that

Custom metadata fields in particular, I'd categorize in the same logical category as specifying which tables or table rows should be ingested. Could be an optional RCF or an Advanced Filtering Field.

-- I missed posting this comment before leaving on PTO and I'm still catching up, please share if I missed any updates in the meantime

danajuratoni commented 1 year ago

@daveyholler @seanstory Let's schedule a meeting if more discussions are needed. I'd like to get clarity on the designs for this feature asap, so that we can resume implementation. Even if this feature will be available only for connector clients until 8.11 is released, it is critical to unblock customer deals.

seanstory commented 1 year ago

After a sync with Dana and Davey, we came to the conclusion that we should allow SPO custom metadata fields to be configured via our Advanced Sync Rules. Requirements: