blevesearch / bleve

A modern text/numeric/geo-spatial/vector indexing library for go
Apache License 2.0
9.99k stars 677 forks source link

Multi field mapping problem #1475

Open markysand opened 3 years ago

markysand commented 3 years ago

I was trying to configure multi fields (like https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html).

    documentMapping := bleve.NewDocumentMapping()

    nameStemMapping := bleve.NewTextFieldMapping()
    nameStemMapping.Analyzer = stemAnalyzer
    nameStemMapping.Name = "stem"

    nameNgramMapping := bleve.NewTextFieldMapping()
    nameNgramMapping.Analyzer = ngramAnalyzer
    nameNgramMapping.Name = "ngram"

    documentMapping.AddFieldMappingsAt("name", nameStemMapping, nameNgramMapping)

    searchTermsStemMapping := bleve.NewTextFieldMapping()
    searchTermsStemMapping.Analyzer = stemAnalyzer
    searchTermsStemMapping.Name = "stem"

    searchTermsNgramMapping := bleve.NewTextFieldMapping()
    searchTermsNgramMapping.Analyzer = ngramAnalyzer
    searchTermsNgramMapping.Name = "ngram"

    documentMapping.AddFieldMappingsAt("searchTerms", searchTermsStemMapping, searchTermsNgramMapping)

    mapping.AddDocumentMapping("ISSUE", documentMapping)

With this I expected to be able to search a field like:

    searchTermsStemQuery := bleve.NewMatchQuery(needle)
    searchTermsStemQuery.SetField("searchTerms.ngram")

But I keep getting no results. I have gotten Bleve to work before with single custom analyzers for different parts of a document.

mschoch commented 3 years ago

Everything I see here looks OK. Can you share more of the code around the mapping? Often times a minor issue with the way types are identified means that the mapping is largely ignored.

markysand commented 3 years ago

I get it now. There are 2 types of names, dotpath names referring to the indexed documents and then custom names of fieldMappings. If you add a fieldMapping with empty name, it takes the dotpath name of the underlying document part. But if you add a fieldMapping.name, then as a name that behaves as an absolute path. So I did:

func addDocumentToMapping(m *mapping.IndexMappingImpl) {
    // mapping main
    documentMapping := bleve.NewDocumentMapping()

    // fields on name
    nameBase := bleve.NewTextFieldMapping()

    nameStemMapping := bleve.NewTextFieldMapping()
    nameStemMapping.Analyzer = stemAnalyzer
    nameStemMapping.Name = "nameStem"

    nameNgramMapping := bleve.NewTextFieldMapping()
    nameNgramMapping.Analyzer = ngramAnalyzer
    nameNgramMapping.Name = "nameNgram"

    documentMapping.AddFieldMappingsAt("name", nameBase, nameStemMapping, nameNgramMapping)

    // fields on searchTerms
    searchTermsBase := bleve.NewTextFieldMapping()

    searchTermsStemMapping := bleve.NewTextFieldMapping()
    searchTermsStemMapping.Analyzer = stemAnalyzer
    searchTermsStemMapping.Name = "searchTermsStem"

    searchTermsNgramMapping := bleve.NewTextFieldMapping()
    searchTermsNgramMapping.Analyzer = ngramAnalyzer
    searchTermsNgramMapping.Name = "searchTermsNgram"

    documentMapping.AddFieldMappingsAt("searchTerms", searchTermsStemMapping, searchTermsNgramMapping, searchTermsBase)

    m.AddDocumentMapping(documentType, documentMapping)

    return
}
markysand commented 3 years ago

I guess I was used to elastic, where a named field mapping is accessed as a part of the document (name-wise).

mschoch commented 3 years ago

Hmmn, I think there is still some misunderstanding. What is the purpose of the nameBase and searchTermsBase?

Regarding the name field, you are correct, if it is empty it simply uses the name of the document it's enclosed in. This simplifies the config for sub-document mappings with one field (the most common case). However, if you specify the name, it is not defined from the root, but rather inherits a prefix from the hierarchy. See: https://github.com/blevesearch/bleve/blob/master/mapping/field.go#L266-L272

Related to this, we actually want to introduce a new flag so that you can define a name from the root, because inheriting from the parent seems to never be what the users want (deeply nested JSON is fine, but no wants search fields named a.b.c.d.)