Open hannesfostie opened 7 years ago
Hi folks,
The Algolia docs mention that you should try to split big records into multiple objects to be indexed. The Rails app that I work on has a couple of those, and I tried to find a way to do this with the algoliasearch-rails gem but it appears that is not currently possible.
Hi @hannesfostie,
that's correct. So far it's not something doable with this rails integration.
If I were to do this using the ruby library for Algolia, it would mean basically reinventing the wheel, and reimplementing much of the callbacks and what not that this gem conveniently adds for you.
Right :/
That made me think of an alternative, one that I think would be a good feature for this gem, even if it's an undocumented one. I created this issue to bounce my idea off of you, validate if it would work, ask for pointers, and finally ask if it would be accepted as a feature if it lives up to your standards.
Oh yes sure; that would be awesome :)
What I had in mind is basically extracting the code that transforms object attributes into json into a new method, let's call it #to_algolia_json (or hash).
I would maybe go for to_algolia_object
?
The idea here is that if we're dealing with a single hash, we could index it like this gem already does. If it's an array, we could create multiple records in Algolia instead of a single one that is too big. Dealing with a return value that is an array might require a change in another place, possibly the Algolia ruby library.
I get that and I see one potential issue we'll need to deal with:
updates
and deletes
as well.Ex:
X
is created and splitted into 3 pieces, therefore we push to algolia X_0
, X_1
, X_2
X
is updated and is now splitted in 2 pieces; you need to override X_0
and X_1
and delete X_2
(doable, but you probably need to "diff" what changed since last time)X
is removed, you need to remove all of them (doable with delete_by_query
)Does that make sense?
The original code has been written a looooong time ago and since then it has been patched here and there; making the settings/options/replicas handling is little bit messy.
If you don't manage to make it work, let me know; happy to help!
@redox the "problem" you mention is one I had thought of as well (though in a different scenario), you make a very good point. What I was trying to figure out the other day is how Algolia is meant to keep track of "Algolia Objects" for each "ActiveRecord Object". Do the Algolia Objects all share the AR ID of the AR Object? The docs mention "distinct" queries, I suppose they'd use this ID?
If that is the case then it should be possible to delete them all and just regenerate them, so that none are left behind. The one thing we'd need to figure out is if this ID could ever change, so that no orphaned objects are left.
I've been going through the code a little yesterday and then this morning, and now feel kind of stuck. I don't feel confident making any changes to start adding this feature because a lot of the methods have different variations, instance vs class methods, and use a bunch of instance variables whose (possible) values are not entirely clear to me.
I was trying to refactor the code to a point where an AR class/model has an instance method to_algolia_object
that returns the hash to be indexed, but didn't get very far. Is there any chance I could get some pointers, or for someone to give this a shot so that I can try to take it from there?
Thanks!
I was trying to refactor the code to a point where an AR class/model has an instance method to_algolia_object that returns the hash to be indexed, but didn't get very far. Is there any chance I could get some pointers, or for someone to give this a shot so that I can try to take it from there?
I'm gonna take a look at it, probably next week because of a packed WE /o\
Appreciate it @redox !
Have you been able to take a look at this by any chance, @redox ?
Sorry @hannesfostie; I didn't... I'll work on it next week!
I took a deeper look at the code @hannesfostie and we might have one issue with the deletion process.
For now, the Rails gem is in charge of deleting the objects once they are removed from the source DB. As soon as you start splitting the objects into multiple objects, any update of the source object could trigger deletions.
For instance, let's assume you have an object with a big text attribute that is ultimately split into 3 smaller objects. If you update this object and it's now only split into 2 objects (because the attribute is now shorter), you should remove 1 object from the index and update/override the 2 others.
Unfortunately, removing those objects is not that straight forward... I'm afraid the current architecture of the Rails gem is not super suitable for such a use-case and I strongly think you guys should build something custom on top of the algoliasearch gem -> because you'll be able to write it for you needs, I believe it will be way easier (and less messy for the gem).
I can help you guys write a Concern
doing that if you think this could be helpful. Let me know what you think @hannesfostie.
That sounds good @redox - do you mind if I email you in the next couple days on the address in your profile?
I was afraid modifying the gem would be tricky, so this solution works for us. I do think that in the long term, refactoring the rails gem so it supports this and is a little bit more modular would make for a huge improvement both for its users as for you and your colleagues since it will make changes easier and allow for more customization.
I'm interested in this as well, but for a different use case: I have events with a start_date
and an end_date
. I'd like to be able to input a single date, and get all events that include this date.
I think the best way of doing that is to create 1 record per day for each event (ie. a 5-day event will be stored as 5 records in the index). Then I'll use the distinct
feature to de-duplicate the hits.
So I'd need a way to create several records for each event...
What do you think?
Hi @redox, any idea?
Well, I am suffering lots! from this max 10kb or 20kb idea as well. Can you please take a look at this problem & rails gem support asap. I know for sure that, i won't be able to custom develop such feature for my product, as i am the only developer, and this is a must for me now to be able to use algolia.
Or, you can just cancel the 10kb-20kb rule like it used to be. and this all sorts out itself easily..
Any updates on this?
You can use ActiveSupport::JSON.encode(your_record).size
in your function handling indexing to evaluate how big the record is.
Hi folks,
The Algolia docs mention that you should try to split big records into multiple objects to be indexed. The Rails app that I work on has a couple of those, and I tried to find a way to do this with the
algoliasearch-rails
gem but it appears that is not currently possible.If I were to do this using the ruby library for Algolia, it would mean basically reinventing the wheel, and reimplementing much of the callbacks and what not that this gem conveniently adds for you.
That made me think of an alternative, one that I think would be a good feature for this gem, even if it's an undocumented one. I created this issue to bounce my idea off of you, validate if it would work, ask for pointers, and finally ask if it would be accepted as a feature if it lives up to your standards.
What I had in mind is basically extracting the code that transforms object attributes into json into a new method, let's call it
#to_algolia_json
(or hash).Because this is now a method with a single responsibility, it would allow me to overwrite this method in our models and return a hash or an array (or something else that responds to
#to_json
).The idea here is that if we're dealing with a single hash, we could index it like this gem already does. If it's an array, we could create multiple records in Algolia instead of a single one that is too big. Dealing with a return value that is an array might require a change in another place, possibly the Algolia ruby library.
Does this make sense?
Thank you!