cerebris / jsonapi-resources

A resource-focused Rails library for developing JSON:API compliant servers.
http://jsonapi-resources.com
MIT License
2.32k stars 532 forks source link

Full Text Search with ElasticSearch (or Solr) #468

Closed akharris closed 4 years ago

akharris commented 9 years ago

probably related #460

Are there any plans to implement full-text search via ElasticSearch or Solr? Perhaps through some configuration settings + Resource method overwriting. I'm currently working on a JSONAPI prototype that uses ElasticSearch for queries and am wondering if a saner path is nigh.

Also, awesome work on this gem. It's fantastic and the source code is very readable :+1:

lgebhardt commented 9 years ago

@AKHarris, glad to hear you are finding the gem useful. I haven't thought much about full text search, so this might be bad advice. It looks like there is an (elasticsearch-rails)[https://github.com/elastic/elasticsearch-rails] gem that adds elasticsearch support to ActiveRecord::Base. So you might be able to use this with simply overriding the Resource.apply_filter method to use the search method provided by that gem (as you say, related to #460).

It's likely that any gem that is useable through ActiveRecord will be useable in conjunction with jsonapi-resources (JR) since JR uses ActiveRecord at it's core for searching/filtering.

akharris commented 9 years ago

@lgebhardt thanks for the response. I'll keep exploring and take a look at elasticsearch-rails. I can try to wrap up what I discover for the docs if you're interested.

lgebhardt commented 9 years ago

@AKHarris I'd definitely like to have this written up in the Wiki if it works. Thanks!

jagthedrummer commented 8 years ago

I've trying to get elasticsearch-rails to play nicely with JR using the method recommended for applying filters in the README. I have a search filter that looks like this:

  filter :search, apply: ->(records, value, _options) {
    # Call `records` on the elasticsearch-rails return value to turn
    # it into an array of ActiveRecord objects
    records.search(value).records
  }

This kinda works, but not really. Calling records on the object that elasticsearch-rails returns will allow JR to deal with the result, but you always get the first 10 records, and can't page through results. Since no paging info is passed to the search filter there's no way to handle the paging in elasticsearch before passing through to JR. There's a similar problem with sorting.

I've also tried using the apply_filters as recommended in this thread (even though the README says that this method is deprecated).

  def self.apply_filter(records, filter, value, options)
    case filter
      when :search
        return records.search(value).records
      else
        return super(records, filter, value, options)
    end
  end

When I do this, I see that apply_filters is called twice for the search filter. On the first pass the options param is populated with sort_criteria and paginator data that would allow me to handle sorting and paging inside elasticsearch. However, on the second call, those two data items are missing and it seems that the return value from the second call to apply_filters is what's getting used by JR. So I'm stuck with things being unsorted and only able to view page 1.

Any thoughts on how best to proceed? Do I just need to bypass JR completely and handle serialization myself in the controller action?

NuckChorris commented 8 years ago

I actually wrote an initial (untested and not ported to latest JR yet) integration of the Chewy gem with JR for @hummingbird-me which you might find useful. I'm hoping to upgrade and Gemify it in the next couple months when I find time away from my current client.

See the lib/jsonapi folder, resource mixin, and controller mixin. It exposes the following DSL within resources, which can be accessed via filter[] params in the request:

class AnimeResource < BaseResource
  include SearchableResource

  index MediaIndex::Anime
  query :season, valid: -> (value, _ctx) { Anime::SEASONS.include?(value) }
  query :age_rating, valid: -> (value, _ctx) { Anime::AGE_RATINGS.include?(value) }
  query :text, mode: :query, apply: -> (values, _ctx) {
    { multi_match: {
      fields: %w[titles.* abbreviated_titles synopsis actors characters],
      query: values.join(',')
    } }
  }
end

It's not ideal (it's severely lacking in abstraction or separation of concerns) but it gets the job done for now. Going forward, I'd definitely like to reduce the complexity of hooking into JR and hopefully build some abstractions on top of Chewy so that the mapping of name to query can be handled in the Index class.

I'd love for JR to provide a strategy hookup (or middleware stack, which might be useful to solve ordering problems that occur in non-AR situations) for third-party libraries to handle the application of filters, so that I could just register a strategy and avoid all this nonsense with creating my own SearchOperation and SearchableController.

For example, perhaps something like this:

class MediaResource < BaseResource
  filter :test, strategy: ChewyFilterStrategy, apply: ...
end
shamil614 commented 7 years ago

So I was able to tackle this problem via a custom solution. We just launched the first search feature so the implementation I came up with is rough. But if there's interest, I want to refactor, polish and slap it in a gem. Well actually even if there's not interest, I probably will put it in a gem as we have another ruby app that could use the code.

In short, my approach was to use the official Elasticsearch ruby gem https://github.com/elastic/elasticsearch-ruby Then I built on top of the gem with the least abstraction as possible. Not sure what I can publish here as it's still coupled to the application, and haven't received approval to open source it.

I had two main challenges with the first version: relationships, integration point with JSONAPI Resources.

Relationships are hard because Elastic prefers either denormalization or what it call nested, and nested directly conflicts with the JSONAPI spec. In order for nesting to work the included property would need to look like this

{ 
  data: {
    id: 1,
    type: "jobs"
    attributes: { ... }
  },
  included: {
     job_assignments: [
      // job has man assignments 
      // list of assignments
     ]
  }
}

For the first version I went with denormalizing where I flattened some data into a resource that is only used for search.

The other challenge of how to hook into JSONAPI Resources was a bit of a hybrid of Railsy goodness and a bit of JSONAPI Resources goodness. The Railsy part is because I wanted to make sure the ES document was indexed after the DB committed.

Below is some redacted lib code. It gives you the idea of the approach I went.

class Job <  ActiveRecord::Base

 after_commit :update_search

  def update_search
    DocumentIndexWorker.perform_async(self.class.to_s, id)
  end
end
class DocumentIndexWorker
  include Sidekiq::Worker
  sidekiq_options queue: "priority"

  def perform(record_klass, record_id)
    record = record_klass.constantize.find record_id
    Documents::JobDocument.new(record: record).index
  end
end
class Documents::JobDocument < Search::Document
   # various method you can override
   def self.jsonapi_resource_klass
     JobResource
   end

   def after_build_document(resource, document)
     # modify the document to include flattened relationship data
     document[:users] = resource._model.users.map(&:full_name)
     document 
   end
end
class Search::Document
  def initialize(record:, context: {})
    @context = context
    @record = record
  end

  # Send the document to be indexed.
  # @return [void]
  def index
     client = Elasticsearch::Client.new(url: ENV["ELASTICSEARCH_URL"] || "http://localhost:9200")
     client.index(build_payload)
  end

  private

  def build_document(resource)
     document = build_serializer.serialize_to_hash(resource).with_indifferent_access
     after_build_document(resource, document)
  end 

  def build_resource
    opts = context.any? ? context : jsonapi_resource_context
    jsonapi_resource_klass.new(record, opts)
  end

  def build_serializer
    JSONAPI::ResourceSerializer.new(jsonapi_resource_klass, @jsonapi_serializer_options || {})
  end
end
ianks commented 6 years ago

For anyone that stumbles into this again, I found a solution.

The reason this happens is because the filter gets called twice: once for a count and once for a find. Problem is, the count does not include the paginator since it wants a total count with no regard to paginattion. My solution works like this:

  filter :q, apply: ->(records, values, options) {
    params = apply_elastisearch_sort(options)
    pagination = apply_elastisearch_pagination(options)

    result = records.search(
      query_string: {
        query: values.first
      },
      **params,
      **pagination
    )
    # keep track of the total hits from elasticsearch and memoize this as count
    options[:context][:count] = result.response.dig('hits', 'total')
    result.records
  }
    # override find_count to use the memoized count instead
    def find_count(_filter, options)
      options[:context][:count] || super
    end