Closed avigoldman closed 6 years ago
cool, thanks! I think this should work, but the main reason I wanted atomic operations is to avoid doing a complete reindex on every build. You're right that this will be more correct (i.e. no downtime & no deleted data in the index), but it will still do as many operations as objects in the index.
It would be cool to save all the hashes of the objects in a second index, compare those to the hashes that should be pushed, lastly delete or update those hashes.
Does that make sense?
Since this is such a simple file you could already for now definitely use this solution in your own app, but just was wondering if you'd be interested in exploring further
Ah, yes. I think I follow now.
So just to outline the steps:
Sound right?
Note that we have the hashes already since every node in gatsby has the hash in there somehow. Storing these in a second index would indeed be the preferred way I think.
So 1. Would be: “get the hashes” (each objectID needs to have a hash) from the hashes index. Then calculate hashtable for the “to push index”. Do diffs of the hashes and push/delete etc. This last step can probably happen directly in the prod index since batch operations are considered atomic by Algolia (in the order they arrive in the index).
This should be a good way of handling it.
Thanks again for picking this up (and sorry for not being able to clone/contribute for now, I’m only on my phone in the weekend)
Weighing in here in hopes of getting some additional attention on this issue. I'm scoping out the stack for a project for a client now and, on account of this issue, I'll be using Lunr.js instead of Algolia. I believe this is the third separate projects I've had to do this on. Perhaps it doesn't seem like a big deal, but due to the way development works in an organization (many people running local instances, lots of restarts to pick up new data, test builds, etc) and the way Algolia prices per indexing op, this ends up crazy expensive.
For example: with 500 blog posts on a website under active development where Gatsby gets booted 20 times per day on average (fairly conservative), there end up 300,000 records in Algolia within 30 days, costing over $300 a month on the Essential plan. That continues to grow as Gatsby gets rebooted.
Comparatively, I can build this into a Lunr.js index and send it to the client compressed in about 200kb and have a reasonable search for free. I'd rather use Algolia for the additional features, but again, cost, which is really tracking back to this specific issue.
Hopefully Algolia can dedicate some resources to this issue or otherwise make it possible to use this library by the time my next client with search project begins.
Hey @coreyward, I'm aware that this is definitely something to work on, but since I'm working on lots of other things at the moment, I haven't yet had time to fit this in.
Note that this PR was already tested by @avigoldman and he said it worked, where I was looking for a solution that does even less operations. The plan I had in mind for this is:
@coreyward, are you using unique objectIDs or not?
Thanks @avigoldman and sorry for the delay here.
@coreyward you probably just need to be sure you use unique objects. If you update 1000 times per day, or once per day doesn't change how many records you have. However, this will cause as many operations as there are items on every build. This is a separate issue we can fix another time.
Fixes #1