MukurtuCMS / Mukurtu-CMS

Mukurtu CMS v4 Development
17 stars 3 forks source link

Search indexed fields for SAPI config #106

Open michael-wynne-wsu opened 9 months ago

michael-wynne-wsu commented 9 months ago

Digital Heritage

Title Summary Media transcription field? (I) Community (I) Protocol (I) Category (I) Creator (I) Contributor (I - and if possible, selecting a range) Original date Original date description Cultural narrative Traditional knowledge Description (I) Keywords (I) Local Contexts Labels and Notices? Citing Indigenous Elders and Knowledge Keepers (all fields?) (I) Rights statements? (I) Creative Commons Licenses? (I) Format (I) Type Identifier (I) Language Source (I) Subject Transcription (I) People (I) Location Location Description Related content? (I) Collections?

Dictionary Word

Title (v3: Term) (I) Community (I) Protocol (I) Language (I) Glossary Entry (I) Keywords Location Related content? (I) Collections? (I) Word lists?

Dictionary Word Entry

Alternate spelling Definition Sample sentences (I) Word type Pronunciation Translation Source Word origin (I) Contributor

Person

Title (v3: Name) (I) Keywords Text sections Representative terms (v3: Mukurtu terms) (I) Community (I) Protocol Related people? (I) Collections? Related content?

Word List

Title (v3: Word list name) Summary Description (I) Keywords Source (v3: Credit) Words (I) Community (I) Protocol (I) Collections? Related content?

Collection

Title (v3: Collection name) Summary Description (I) Keywords Source (v3: Credit) (I) Community (I) Protocol Items in collection Sub-collections Top-level collection Related content?

michael-wynne-wsu commented 9 months ago

@steve-taylor-wsu I have a few questions/notes, but by and large, here are SAPI fields.

1) I added related content and collections for all relevant content types since it was inconsistent in v3.

2) I would like to include a protocol facet in default v4 search, so added that.

3) Similarly, facets for Local Contexts Labels/notices, creative commons, and right statements may be useful.

4) Added dictionary word entry fields, since those weren't specified in v3.

Am I missing anything?

kimberlychristen commented 9 months ago

For Local Context— you mean Labels and Notices, yes? There are no licenses yet.--KimOn Oct 20, 2023, at 1:35 PM, Michael Wynne @.***> wrote: @steve-taylor-wsu I have a few questions/notes, but by and large, here are SAPI fields.

I added related content and collections for all relevant content types since it was inconsistent in v3.

I would like to include a protocol facet in default v4 search, so added that.

Similarly, facets for Local Contexts Labels/licenses, creative commons, and right statements may be useful.

Added dictionary word entry fields, since those weren't specified in v3.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

michael-wynne-wsu commented 9 months ago

@kimberlychristen Yes, that's what I meant, just mis-wrote

taylor-steve commented 8 months ago

@michael-wynne-wsu Could you please note on any of these fields if they should be indexed for facets vs full text searching? We'll default to full text if you don't know. Thanks.

michael-wynne-wsu commented 8 months ago

@steve-taylor-wsu I added (I) for the fields that I would expect to be indexed. let me know if that's not what you needed.

taylor-steve commented 8 months ago

That works, thanks!

taylor-steve commented 8 months ago

@nick-deer for now, try to reflect these in all the indexes, as appropriate (e.g., no need to index DH in the dictionary index). Skip the "auto-index" entirely.

If we hit the table limit on an index, we'll need to define specifically where in the software each of these searches/facets will be used, so we aren't wasting index space on unused field indexing.

taylor-steve commented 8 months ago

@michael-wynne-wsu, Nick has found one of the SAPI DB limits. We're limited to at most 63 indexed fields per search index.

Right now, we have 4 SAPI search indexes that provide for the following uses (currently, /browse and /digital-heritage share an index):

Currently /browse will exceed that 63 field limit if we try and incorporate all your listed fields. Do you want to pick the fields, do you want us to brainstorm it the meeting, or do you want our best guess on what's important for that index? Thanks!

michael-wynne-wsu commented 8 months ago

@steve-taylor-wsu Just so I can get the math right...

On /browse, is a field like keywords that is present in multiple content types counted once, or once per content type?

taylor-steve commented 8 months ago

@michael-wynne-wsu Content types don't come into play, only entity types (e.g., nodes vs media), so in your example, keywords would only consume one field (unless you want faceting and full text search, see below).

The only caveat here is there are some places where we might verbally describe the fields as the same thing, but they have different machine names because they have incompatible configuration. For example, both Digital Heritage and Dictionary Word have a "Language" field, but they aren't the same field (field_language vs field_dictionary_word_language), so that'd be 2 fields.

Also, if you want something indexed for both faceting and full text search (e.g., Categories) that requires 2 fields, one as a string for the facet system and one as full text for the text search.

michael-wynne-wsu commented 8 months ago

As discussed in meeting, Steve and Nick can make a best effort at selecting/removing fields.