cloudant-labs / clouseau

Expose Lucene features as an erlang-like node
Apache License 2.0
58 stars 32 forks source link

Return None for name field analyzer #51

Closed tonysun83 closed 2 years ago

tonysun83 commented 2 years ago

When a user incorrectly creates an analyzer with an object instead of a string, we don't have a matching clause for it which causes match exception to be thrown. We return None in any case where the analyzer is not a string.

Testing:

Without this fix, Clouseau cannot start as it will always crash with the matching clause

[actor:19] ERROR overlock.threadpool.ErrorLoggedThread - Exception was thrown in thread actor:19
scala.MatchError: Some(List((name,standard))) (of class scala.Some)
    at com.cloudant.clouseau.SupportedAnalyzers$.createAnalyzerInt(SupportedAnalyzers.scala:100)

With this change, I tried creating mango text indexes with invalid search analyzers and they no longer crashed clouseau. Interestingly, we can still query that particular index, despite having an invalid analyzer.

curl -X POST http://adm:pass@localhost:5984/movies/_find -H 'Content-Type: application/json' -d '{"selector": {"generator": "ok"},"use_index": "text_object_started"}'
{"docs":[
],
"bookmark": "g2o"}

I don't think this is the right behavior unless we are defaulting to a specific analyzer. We should be crashing here:

https://github.com/cloudant/clouseau/blob/master/src/main/scala/com/cloudant/clouseau/IndexService.scala#L888-L889

given that the analyzer return is None. But clouseau prints no errors and nothing is printed from Dreyfus either. Keeping invalid indexes that return results is incorrect behavior even though clouseau does not crash. It seems we broadcast that the analyzer is invalid here:

https://github.com/cloudant-labs/clouseau/blob/master/src/main/scala/com/cloudant/clouseau/IndexManagerService.scala#L201-L205

but since it is a cast, dreyfus I'm assuming silently ignores it.

This fix is to prevent Clouseau from crashing. Returning None has been the default for invalid Analyzers since the inception of Clouseau, so I don't think it's within the scope of this PR modify behavior when the analyzer is invalid.

tonysun83 commented 2 years ago

With the new updated change, in the corner case when a user provides an object as a perfield name, we no longer crash Clouseau.

curl -X POST http://adm:pass@localhost:5984/movies/_find -H 'Content-Type: application/json' -d '{"selector": {"V11Movie_name": "ok"},"use_index": "Movie_name-bad11-text"}'
{"error":"text_search_error","reason":"<<\"no_such_analyzer\">>","ref":1967189495}

Instead we return what would be a 500 to the user.

Note also this will be a breaking change for existing users that may have had existing invalid perfield analyzers. Previously, the search would use the default analyzer (see test case change). Now they will get the error above (either via mango text indexes or normal search).