Closed akharris closed 4 years ago
@AKHarris, glad to hear you are finding the gem useful. I haven't thought much about full text search, so this might be bad advice. It looks like there is an (elasticsearch-rails)[https://github.com/elastic/elasticsearch-rails] gem that adds elasticsearch support to ActiveRecord::Base. So you might be able to use this with simply overriding the Resource.apply_filter method to use the search
method provided by that gem (as you say, related to #460).
It's likely that any gem that is useable through ActiveRecord will be useable in conjunction with jsonapi-resources (JR) since JR uses ActiveRecord at it's core for searching/filtering.
@lgebhardt thanks for the response. I'll keep exploring and take a look at elasticsearch-rails. I can try to wrap up what I discover for the docs if you're interested.
@AKHarris I'd definitely like to have this written up in the Wiki if it works. Thanks!
I've trying to get elasticsearch-rails to play nicely with JR using the method recommended for applying filters in the README. I have a search
filter that looks like this:
filter :search, apply: ->(records, value, _options) {
# Call `records` on the elasticsearch-rails return value to turn
# it into an array of ActiveRecord objects
records.search(value).records
}
This kinda works, but not really. Calling records
on the object that elasticsearch-rails returns will allow JR to deal with the result, but you always get the first 10 records, and can't page through results. Since no paging info is passed to the search
filter there's no way to handle the paging in elasticsearch before passing through to JR. There's a similar problem with sorting.
I've also tried using the apply_filters
as recommended in this thread (even though the README says that this method is deprecated).
def self.apply_filter(records, filter, value, options)
case filter
when :search
return records.search(value).records
else
return super(records, filter, value, options)
end
end
When I do this, I see that apply_filters
is called twice for the search
filter. On the first pass the options
param is populated with sort_criteria
and paginator
data that would allow me to handle sorting and paging inside elasticsearch. However, on the second call, those two data items are missing and it seems that the return value from the second call to apply_filters
is what's getting used by JR. So I'm stuck with things being unsorted and only able to view page 1.
Any thoughts on how best to proceed? Do I just need to bypass JR completely and handle serialization myself in the controller action?
I actually wrote an initial (untested and not ported to latest JR yet) integration of the Chewy gem with JR for @hummingbird-me which you might find useful. I'm hoping to upgrade and Gemify it in the next couple months when I find time away from my current client.
See the lib/jsonapi folder, resource mixin, and controller mixin. It exposes the following DSL within resources, which can be accessed via filter[]
params in the request:
class AnimeResource < BaseResource
include SearchableResource
index MediaIndex::Anime
query :season, valid: -> (value, _ctx) { Anime::SEASONS.include?(value) }
query :age_rating, valid: -> (value, _ctx) { Anime::AGE_RATINGS.include?(value) }
query :text, mode: :query, apply: -> (values, _ctx) {
{ multi_match: {
fields: %w[titles.* abbreviated_titles synopsis actors characters],
query: values.join(',')
} }
}
end
It's not ideal (it's severely lacking in abstraction or separation of concerns) but it gets the job done for now. Going forward, I'd definitely like to reduce the complexity of hooking into JR and hopefully build some abstractions on top of Chewy so that the mapping of name to query can be handled in the Index class.
I'd love for JR to provide a strategy hookup (or middleware stack, which might be useful to solve ordering problems that occur in non-AR situations) for third-party libraries to handle the application of filters, so that I could just register a strategy and avoid all this nonsense with creating my own SearchOperation and SearchableController.
For example, perhaps something like this:
class MediaResource < BaseResource
filter :test, strategy: ChewyFilterStrategy, apply: ...
end
So I was able to tackle this problem via a custom solution. We just launched the first search feature so the implementation I came up with is rough. But if there's interest, I want to refactor, polish and slap it in a gem. Well actually even if there's not interest, I probably will put it in a gem as we have another ruby app that could use the code.
In short, my approach was to use the official Elasticsearch ruby gem https://github.com/elastic/elasticsearch-ruby Then I built on top of the gem with the least abstraction as possible. Not sure what I can publish here as it's still coupled to the application, and haven't received approval to open source it.
I had two main challenges with the first version: relationships, integration point with JSONAPI Resources.
Relationships are hard because Elastic prefers either denormalization or what it call nested, and nested directly conflicts with the JSONAPI spec. In order for nesting to work the included
property would need to look like this
{
data: {
id: 1,
type: "jobs"
attributes: { ... }
},
included: {
job_assignments: [
// job has man assignments
// list of assignments
]
}
}
For the first version I went with denormalizing where I flattened some data into a resource that is only used for search.
The other challenge of how to hook into JSONAPI Resources was a bit of a hybrid of Railsy goodness and a bit of JSONAPI Resources goodness. The Railsy part is because I wanted to make sure the ES document was indexed after the DB committed.
Below is some redacted lib code. It gives you the idea of the approach I went.
class Job < ActiveRecord::Base
after_commit :update_search
def update_search
DocumentIndexWorker.perform_async(self.class.to_s, id)
end
end
class DocumentIndexWorker
include Sidekiq::Worker
sidekiq_options queue: "priority"
def perform(record_klass, record_id)
record = record_klass.constantize.find record_id
Documents::JobDocument.new(record: record).index
end
end
class Documents::JobDocument < Search::Document
# various method you can override
def self.jsonapi_resource_klass
JobResource
end
def after_build_document(resource, document)
# modify the document to include flattened relationship data
document[:users] = resource._model.users.map(&:full_name)
document
end
end
class Search::Document
def initialize(record:, context: {})
@context = context
@record = record
end
# Send the document to be indexed.
# @return [void]
def index
client = Elasticsearch::Client.new(url: ENV["ELASTICSEARCH_URL"] || "http://localhost:9200")
client.index(build_payload)
end
private
def build_document(resource)
document = build_serializer.serialize_to_hash(resource).with_indifferent_access
after_build_document(resource, document)
end
def build_resource
opts = context.any? ? context : jsonapi_resource_context
jsonapi_resource_klass.new(record, opts)
end
def build_serializer
JSONAPI::ResourceSerializer.new(jsonapi_resource_klass, @jsonapi_serializer_options || {})
end
end
For anyone that stumbles into this again, I found a solution.
The reason this happens is because the filter gets called twice: once for a count
and once for a find
. Problem is, the count does not include the paginator since it wants a total count with no regard to paginattion. My solution works like this:
filter :q, apply: ->(records, values, options) {
params = apply_elastisearch_sort(options)
pagination = apply_elastisearch_pagination(options)
result = records.search(
query_string: {
query: values.first
},
**params,
**pagination
)
# keep track of the total hits from elasticsearch and memoize this as count
options[:context][:count] = result.response.dig('hits', 'total')
result.records
}
# override find_count to use the memoized count instead
def find_count(_filter, options)
options[:context][:count] || super
end
probably related #460
Are there any plans to implement full-text search via ElasticSearch or Solr? Perhaps through some configuration settings + Resource method overwriting. I'm currently working on a JSONAPI prototype that uses ElasticSearch for queries and am wondering if a saner path is nigh.
Also, awesome work on this gem. It's fantastic and the source code is very readable :+1: