cloudant-labs / clouseau

Expose Lucene features as an erlang-like node
Apache License 2.0
58 stars 32 forks source link

Introduce types to supported analyzers #83

Closed iilyak closed 9 months ago

iilyak commented 9 months ago

Refactor createAnalyzer to remove polymorphic arguments and use Map[String, Any]

The polymorphism makes it harder to introduce types. Because it requires union types. Which are not natively supported by the version of Scala we use. Elimination of the polymorphism would allow us to introduce types.

The introduction of types is the vehicle to solve type erasure problem we would have to deal with when we upgrade Scala to the next version.

The refactoring is done using following steps

  1. Add type specific constructors for Analyzer
    • createAnalyzerFromString(name: String)
    • createAnalyzer(options: Map[String, Any])
    • createAnalyzerFromStringInt(name: String)
    • createAnalyzerInt(options: Map[String, Any])
  2. Add type specific constructors for OpenIndexMsg
    • fromMap(peer: Pid, path: String, map: Map[_, _])
    • fromAnalyzerName(peer: Pid, path: String, name: String)
    • fromKVsList(peer: Pid, path: String, options: List[_])
  3. Make sure we correctly go from Any to the concrete type
    • def collectKVs(list: List[_]): List[(String, Any)] =
        list.collect {
            case t @ (_: String, _: Any) => t
        }.asInstanceOf[List[(String, Any)]]
    • options.get("name").map(_.asInstanceOf[String])

Assumptions

  1. The keys of options passed to OpenIndexMsg are strings
  2. It is ok to just ignore all non-string keys in options
  3. The analyzer name is always a String
  4. The stopwords is a list
  5. The elements of a stopwords are strings
  6. It is ok to skip non-strings elements of stopwords
  7. The fields value is a list
  8. The elements of a fields list are tuples (String, String)
  9. It is ok to skip elements which do not match the (String, String) pattern
  10. The config is a String in ('analyze, config, text) message in AnalyzerService.handleCall and it should really be named ('analyze, analyzerName, text).
iilyak commented 9 months ago

I did verify my assumptions via reading code of:

As well as CouchDb and Cloudant documentation.

rnewson commented 9 months ago

The term "refactor" refers only to software edits that do not make functional changes, but just rearrange code to make a future functional change easier. The term has no value otherwise.

Please replace the word "refactor" with "change" or some synonym in the description, as clearly the PR makes many function changes in addition to the stated goal of "introducing types".

iilyak commented 9 months ago

Please replace the word "refactor" with "change" or some synonym in the description, as clearly the PR makes many function changes in addition to the stated goal of "introducing types".

The goal of this PR is to avoid public API changes. If it does change the protocol then it is a bug in the PR. I'll try harder to prevent the public API changes. Which probably would result in a very different version of this PR.

iilyak commented 9 months ago

closing in favor of https://github.com/cloudant-labs/clouseau/pull/84