FusionAuth / fusionauth-issues

FusionAuth issue submission project
https://fusionauth.io
91 stars 12 forks source link

User searching for first and lastname not possible using ElasticSearch search engine #712

Open nikos opened 4 years ago

nikos commented 4 years ago

When using FusionAuth 1.15.5 with the Java SDK it seems that the documented fields (see https://fusionauth.io/docs/v1/tech/apis/users#search-for-users) do NOT work:

Kotlin example:

        val response = fusionAuthClient.searchUsersByQuery(
                SearchRequest(UserSearchCriteria().apply {
                    queryString = "fullName:Joe"
                    startRow = offset
                    numberOfResults = pageSize
                    sortFields = listOf(SortField("username"))
                }))

Related

robotdan commented 4 years ago

If you're not using Elasticsearch, the ES Query String DSL does not work. Your query should be queryString = "Joe".

There is also a doc bug, those field should be lastName, firstName and fullName. But the note in the doc is only meant to say those are the only fields that will be searched.

robotdan commented 4 years ago

@mooreds do you want to take a look at any of our search doc and see if we need to clarify how to use the search APIs when not using Elasticsearch?

In this case @nikos was using fullName:Joe - but we don't support this DSL without Elasticsearch. This may not be clear in our documentation. The doc is intended to indicate when you search with a string, we will compare the documented fields, but you can't query on them directly using the fullName: notation.

nikos commented 4 years ago

@robotdan Thanks for coming back so quickly to my question.

I am a bit confused, since I thought FusionAuth is currently only available with ElasticSearch as search backend (at least the last time I tried it, this might be back to 1.11? starting was unhappy with ES missing). So we are using an ElasticSearch full text index, the mapping looks like:

{
    "fusionauth_user": {
        "mappings": {
            "_doc": {
                "_source": {
                    "enabled": false
                },
                "properties": {
                    "active": {
                        "type": "boolean"
                    },
                    "birthDate": {
                        "type": "date"
                    },
                    "data": {
                        "properties": {
                            "email": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "emobilityId": {
                                "type": "long"
                            },
                            "mainMandant": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "technicalUser": {
                                "type": "boolean"
                            }
                        }
                    },
                    "email": {
                        "type": "text",
                        "analyzer": "exact_lower",
                        "fielddata": true
                    },
                    "fullName": {
                        "type": "text",
                        "fielddata": true
                    },
                    "id": {
                        "type": "keyword"
                    },
                    "insertInstant": {
                        "type": "date"
                    },
                    "lastLoginInstant": {
                        "type": "date"
                    },
                    "login": {
                        "type": "keyword"
                    },
                    "memberships": {
                        "properties": {
                            "data": {
                                "type": "object"
                            },
                            "groupId": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "id": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "insertInstant": {
                                "type": "long"
                            },
                            "userId": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            }
                        }
                    },
                    "registrations": {
                        "type": "nested",
                        "include_in_parent": true,
                        "properties": {
                            "applicationId": {
                                "type": "keyword"
                            },
                            "data": {
                                "type": "object"
                            },
                            "id": {
                                "type": "keyword"
                            },
                            "insertInstant": {
                                "type": "date"
                            },
                            "lastLoginInstant": {
                                "type": "date"
                            },
                            "preferredLanguages": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "roles": {
                                "type": "keyword"
                            },
                            "tokens": {
                                "type": "object"
                            },
                            "username": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "usernameStatus": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            },
                            "verified": {
                                "type": "boolean"
                            }
                        }
                    },
                    "tenantId": {
                        "type": "keyword"
                    },
                    "timezone": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "username": {
                        "type": "text",
                        "fielddata": true
                    },
                    "verified": {
                        "type": "boolean"
                    }
                }
            }
        }
    }
}

So it seems like for example fullName is fine, but there are no fields for firstName and lastName: are those hitting the DB first by the Search API endpoint before falling back to the ES full-text search capabilites? Another strange thing: searching for lastName works fine, but when specifying a value for username or firstName all users are returned no matter what the search value looks like?

Sorry for bringing up many questions at once ;-)

mooreds commented 4 years ago

@nikos as of release 1.16, elasticsearch is optional. More details here: https://fusionauth.io/docs/v1/tech/release-notes#version-1-16-0-rc-1 ("Support for using the database as the user search engine. ")

Here's a doc about switching between them: https://fusionauth.io/docs/v1/tech/tutorials/switch-search-engines

mooreds commented 4 years ago

Regarding your questions about the mapping, this is working as designed. I think the docs need make it clear that the firstName and lastName specific field searches only work with the database search engine (though of course we could change that, if it is important to you, please file a feature request).

are those hitting the DB first by the Search API endpoint before falling back to the ES full-text search capabilites

There's no dependencies between the engines--if you are using elasticsearch, it is getting the whole query; the same is true with the database search engine. (Except for if you are searching only by user id.)

but when specifying a value for username or firstName all users are returned no matter what the search value looks like?

How are you building those queries? Can you provide examples? It looks like the admin UI just uses a query_string when searching on username and doesn't specify the actual field.

Hope this helps.

nikos commented 4 years ago

Thanks @mooreds for taking action and clarifying on the different aspects regarding database and fulltext/ES search capabilities (side note: I would prefer if the search client does not have to distinct between the search engines, but can use the same field names).

Coming back to my original question: when making use of the FusionAuth Java client API (1.15.4) against an FusionAuth+ES (1.15.5) server, I was really confused that searching on the lastName does return the expected users (for my Kotlin example, please see my original posting above), but searching for firstName does return all users. Note that both fields seem not be explictliy mapped into ES documents (see mapping).

What do you think could gone wrong in my case? It is expected that the search value is put into double or single quotes? How are a white space supposed to be supported: firstName:'Joe Foo' ?

mooreds commented 4 years ago

side note: I would prefer if the search client does not have to distinct between the search engines, but can use the same field names

Hmmm. What do you mean? You can use queryString for both, it just has different limitations. I'm not sure what you mean.

Note that both fields seem not be explictliy mapped into ES documents (see mapping).

I only see fullName in the mapping. What am I missing?

We don't create a field on firstName or lastName, but you can use wildcards to search on them in the queryString:

queryString = "Joe*"

queryString = "*Smith"

I realize that doesn't quite get you what you want, though. However, we can keep this open as a feature request to index firstName and lastName.

nikos commented 4 years ago

This is working when querying users with the Java client API: queryString = "firstName:Joe" as opposed to queryString = "lastName:Smith" (of course just sample values, in reality those are matching to existing users and their given last resp. first names ;-)

nikos commented 4 years ago

For an application developer it should not make a difference if the search is using database or elasticsearch capabilities regarding the names of the fields, or which do only exist if the search engine type is database.

nikos commented 4 years ago

Added some more general questions on how to use the user search in detail over at https://github.com/FusionAuth/fusionauth-site/pull/118#issuecomment-652205369

nikos commented 4 years ago

I revised my code and currently use the query string fullName:Joe* as first name equivalent and fullName:*Smith as search for the last name, until hopefully first and last name will be supported also for ElasticSearch as search engine.

robotdan commented 4 years ago

Internal: Any reason not to index each of these fields?

Currently we are indexing a single field called fullName which is built from fullName if provided, or it is built using a combination of firstName, middleName and lastName.

mooreds commented 4 years ago

I can't think of any reason not to index these fields. Seems like a good move to me from the user perspective.

However, if I were implementing, I'd consider:

I am afraid I don't know enough about the internals to have a valid opinion on that stuff.

voidmain commented 4 years ago

I believe we wrote the original indexing to use fullName and let Elastic handle that as more of a document than a single value. Elastic should tokenize and make each piece of the name searchable. The only issue will be when different name components are the same. Like a lastName of John. Or a middle name of Smith.

mooreds commented 4 years ago

@nikos I just pushed a fix for the search engine documentation which hopefully makes things more clear and addresses your other questions. https://fusionauth.io/docs/v1/tech/apis/users#search-for-users

mooreds commented 3 years ago

Removed documentation tag as this seems to be a bug about indexing first/last name now.

jobannon commented 1 year ago

I am leaving a comment here for future design discussions.

The sorting currently supported on a field like fullName is the same sort behavior that ES offers for a text field mapping.

In other words, if you have

Jim D
Jim B
Jim A
Jim C

and asked for a sort on this field, due to collisions of jim the sort behavior would be non-deterministic and would not sort as such

Jim A
Jim B
Jim C
Jim D

Likewise, if you had something like this

Admin Zot
Fred Bunk
Becky Beu

It might sort to

Admin Zot 
Becky Beu
Fred Bunk

Some folks might want sorting based on an ExactMatch. We would have to update our ES mappings and schema approach to have this behavior.

andrewpai commented 8 months ago

Note that in version 1.49.0, we will support username.exact and fullName.exact fields for more precise searching of these properties.