cloudant-labs / clouseau

Expose Lucene features as an erlang-like node
Apache License 2.0
58 stars 32 forks source link

Introduce types to supported analyzers take2 #84

Closed iilyak closed 9 months ago

iilyak commented 9 months ago

Refactor createAnalyzer to remove polymorphic arguments and use Map[String, Any]

The polymorphism makes it harder to introduce types. Because it requires union types. Which are not natively supported by the version of Scala we use. Elimination of the polymorphism would allow us to introduce types.

The introduction of types is the vehicle to solve type erasure problem we would have to deal with when we upgrade Scala to the next version.

The refactoring is done using following steps

  1. Introduce AnalyzerOptions class with type specific constructors
    • def fromMap(map: Map[_, _])
    • def fromAnalyzerName(name: String)
    • def fromKVsList(options: List[_])
  2. Make sure we correctly go from Any to the concrete type. This PR uses the .collect combinator instead of relying on ClassCastException.

Assumptions

  1. The keys of options passed to OpenIndexMsg are strings.
  2. It is ok to just ignore all non-string keys in options.
  3. The analyzer name is either a String or a single String element wrapped in the List.
  4. The stopwords is a list.
  5. The elements of a stopwords are strings.
  6. It is ok to skip non-strings elements of stopwords.
  7. The fields value is a list.
  8. The elements of a fields list are tuples. (String, String) or (String, [String]).
  9. The config is a String in ('analyze, config, text) message in AnalyzerService.handleCall and it should really be named ('analyze, analyzerName, text).
pgj commented 9 months ago

In the description, items 1-2 and 4-5 seem to be the same, except for the formatting.

pgj commented 9 months ago

The mango-test and elixir-search targets work with recent CouchDB main. However, in the meantime I have fixed the previously mischievous Mango integration test case that tried to pass the default analyzer as a list of strings.

iilyak commented 9 months ago

In the description, items 1-2 and 4-5 seem to be the same, except for the formatting.

I removed duplication in the description of the PR