Open danajuratoni opened 1 year ago
cc: @seanstory we could repurpose this document for indexing custom metadata in a configurable manner
CC @JoseLuisGJ - we'll want some UX insight on a good way to make metadata field selection configurable.
Easiest way (I think) would be new RCFs with comma-separated field names that we allow-list (default values are the fields we chose in 8.9). But that will look gross fast - each document "type" (list
, listItem
, sitePage
, driveItem
, site
, listItemAttachment
, etc) has a different list of fields. So that's a lot of RCFs.
Another approach could be to just solve this with Advanced Sync Rules - letting the customer specify the exact $select
clauses they want for each resource type. But that might be hard to maintain if we decide that certain fields are required, and they are not specified in such a sync rule.
@daveyholler this is highly awaited enhancement, could we get some design input on our best options here before starting implementation?
@danajuratoni happy to help. Can I get a demo, @seanstory on what this looks like in practice? I'm struggling to visualize this in my head.
Sure thing. Dropped time on the cal for tomorrow.
@daveyholler is there any design deliverable planned? @seanstory what are the takeaways from the meeting?
@danajuratoni I've got some updates/clarification questions that I'll write up tomorrow morning.
@danajuratoni
After chatting with Sean, I think that there’s some things here we should dig into a little:
How does a user know which fields are available for them to enter? — It sounds like there’s quite a bit of variation in which fields are present in each connector type. Do users have the ability to see all their field names on a given SharePoint (or other) connector? Is there a way we can sample documents to provide them with a finite list of field options? Or is getting this information more of a back and forth between individuals (roles) within the user’s organization?
What kind of validation are we able to offer after a user specifies field names?
If the fieldnames are “arbitrary” (as far as we’re concerned), and a user can’t validate what they’ve entered, and/or if the process of actually providing those field names is more challenging than selecting from a drop down list, how many users do you anticipate will use the feature in the UI?
And lastly, is this something that we can/should provide via API rather than adding steps/options to the UI?
@danajuratoni as there is no design yet and discussions are still needed, this one can't make 8.10 anymore
I have the feeling we're mixing Rich Configurable Fields(RCF) with I'll call Advanced Filtering Fields.
How does a user know which fields are available for them to enter
RCF are fields each connector sends to Kibana and show up as editable in the Configuration tab. We aim to make these as "rich" and user friendly as possible with placeholders / validations / dropdowns / selection options where possible. These would likely reside in the config.yaml some day. All connectors require at least one RFC to connect to the data source. Additional fields might be added for other functionality such as extraction capabilities or reducing the ingest scope to e.g. a certain table / project / space. These fields for reducing the data corpus to be ingested have a certain overlap with Advanced Filtering Fields. However, Advanced Filtering Fields are fields that
skipExtractingDriveItemsOlderThan
field or contain complex queriesCustom metadata fields in particular, I'd categorize in the same logical category as specifying which tables or table rows should be ingested. Could be an optional RCF or an Advanced Filtering Field.
-- I missed posting this comment before leaving on PTO and I'm still catching up, please share if I missed any updates in the meantime
@daveyholler @seanstory Let's schedule a meeting if more discussions are needed. I'd like to get clarity on the designs for this feature asap, so that we can resume implementation. Even if this feature will be available only for connector clients until 8.11 is released, it is critical to unblock customer deals.
After a sync with Dana and Davey, we came to the conclusion that we should allow SPO custom metadata fields to be configured via our Advanced Sync Rules. Requirements:
$select
clauses as granular as possible. We know that the Sharepoint Rest API is weird with Site Pages for example, and that $select=foo,bar,baz
might work on one Site, but not on another.$select
will be unioned with what we need for the connector to work. This means this feature will be additive only, and can't be used to trim down the number of indexed fields.$select
clause
$select
clause, without their customizations
Issue created in another repo, replicated here for visibility: https://github.com/elastic/connectors-ruby/issues/499
Description
In https://github.com/elastic/connectors-python/issues/1268, we removed many metadata fields from what we index for Sharepoint Online. We did this by adding explicit
$select
clauses to our queries to the Graph API, which tells the API which fields to send in the response.However, we anticipate that many customers will want fine-grained control over what fields they fetch and index, but will also not want to make code changes in order to fetch more/different fields.