Closed Haroenv closed 4 years ago
Good call. I would suggest using algolia-indexing
instead of atomic-algolia
(used in production on TalkSearch and with better test suite).
algolia-indexing
currently only implements what I call "full atomic" indexing. This will make sure to usz as few operations as possible (by only applying a diff of changes), but to do so in an atomic way, it requires a plan that can hold twice the number of records actually used.
I planned on implementing another mode, called "live diff" that will be similar, the only difference being that it won't be atomic (making the diff live on the production index), still using as few operations as possible, but not needing a large plan.
Both modes have their merits, it's all a question of trade-offs. Considering that your current implementation already requires a plan that can hold twice the number of records, I think going with a full atomic can only be an improvement and implementing the live diff can wait.
What I did now is already fairly close to "full atomic" I think, but taking the generated objects as source of truth (create temp index and switch), so not exactly worth the "effort" to switch. But it would be nice if a live diff (since everything always has hashes in graphQL this would be possible to leverage). Is this something you have the bandwidth for to collaborate on @pixelastic?
What you have now still consumes a lot of operations (as you need to re-push all the records to a tmp index on each push). Switching to algolia-indexing would drastically reduce this usage.
I tried to make the package as easy as possible to use (there is one method to call with credentials, settings and records, everything else is automated), as to reduce the amount of effort needed for a switch, but I'd be interested in knowing how I could make this even easier.
Or maybe we're talking about the same thing with different names. Maybe what you call live diff is what I call full atomic :)
Any update on this? I am using way to many operations between builds and 99% of all the data indexed is still the same. Any info on how I could manually implement this "algolia-indexing" you are talking about? links or docs could be helpful :) Thanks though for the plugin.
There was no update because nobody commented here in months, so I worked on other things. Are you interested in contributing here? I can give some pointers where to start.
Yes sure, I would really want this to work so can help out :)
I tried to search algolia-indexing and came to mainly to the Algolia Docs, but I wouldn't know how to or where to start doing changes within this plugin to accomplish what you guys mentioned. I am running my builds on Netlify so the only thing I can use it netlify cache to keep track of indexed objects.
@Haroenv That approach will still try to index every object on every environment/machine that this website is built on. For common deployment targets like Netlify that periodically clear the build cache anyways you're going to be making excessive calls routinely. Algolia ought to offer a way of making this easier.
@u12206050 For what it's worth, I just went with another approach that updates Algolia via an external process instead of using this plugin. In hindsight, trying to couple indexing with build didn't make sense for a structured object search like I have anyways; if you're in a similar situation, that may be much less work.
Ok so from the sounds of it I need some external key:hash storage space that I can query to check before indexing objects since Netlify's cache gets cleared. I'll see if I can first implement a fork that uses Netlify's cache or as a function that is optional whereby anyone can give the hash for a given object key.
Am I correct in assuming that in the current state of the plugin if I simply filter out what has changed it then only adds those objects to the ${indexName}_tmp and then overwrites the existing index once done meaning that only the changed objects will actually be in Algolia and everything else that didn't change will be lost?
Meaning I have to remove that piece of code and update the main index directly?
If you do it that way, there will be a flash of wrong or no results
I've made a pull request. It now supports a generic hash version that will only update objects that have changed. Works well on Netlify as long as the cache persists. Once cache is removed it updates everything again.
@Haroenv I think the algolia-indexing project would be the best place to start. It is still a beta and heavy work in progress, but it does solve a few of the issues mentioned in this thread. It uses the Algolia indexes and records themselves to do a smart diff between what is already in the index and what is about to be pushed to reduce the number of operations used.
As a full disclosure, I no longer work at Algolia, but I intend to keep working on algolia-indexing
when time permits, to improve it even further. The version currently can be greatly improved (see the issues for an explanation)
Thanks, I have removed my previous pull request and made a new one using Algolia to check for updates. It compares specified fields to see if an object should be updated, inserted, removed or just ignored.
This would be great to get in! I also had to move to an external process due to excess records being indexed when they were basically all the same.
Thanks, I have removed my previous pull request and made a new one using Algolia to check for updates. It compares specified fields to see if an object should be updated, inserted, removed or just ignored.
Did this get pushed?
It has not been published yet (sorry), but as far as I can tell @u12206050 has published his fork on npm: https://yarnpkg.com/en/package/gatsby-plugin-algolia-search
@Haroenv thanks for the quick update - i followed the instructions, but can see my operations are increasing with every build - the idea of this was that it would only need to update changed records right?
@danvernon Have you tried this: gatsby-plugin-algolia-search)
@u12206050 yes thats why I just implemented - its doing about 800 actions per build. I have 628 records. Heres my code.
{
resolve: `gatsby-plugin-algolia-search`,
options: {
appId: process.env.GATSBY_ALGOLIA_APP_ID,
apiKey: process.env.ALGOLIA_ADMIN_KEY,
queries,
chunkSize: 10000, // default: 1000
enablePartialUpdates: true, // default: false
matchFields: ['slug', 'modified'], // Array<String> default: ['modified']
},
}
const productQuery = `{
products: allShopifyProduct {
edges {
node {
objectID: id
title
handle
description
images {
originalSrc
}
variants {
price
}
}
}
}
}`
const flatten = arr =>
arr.map(({ node: { ...rest } }) => ({
...rest,
}))
const settings = {
attributesToSnippet: [`description:20`],
}
const queries = [
{
query: productQuery,
transformer: ({ data }) => flatten(data.products.edges),
indexName: `Products`,
settings,
matchFields: ['slug', 'modified'], // Array<String> overrides main match fields, optional
},
]
module.exports = queries
It needs both the slug
and modified
field for comparing, if you don't have those fields change the matchFields
in options to something like updated
and then fetch the updated
field from your source:
const productQuery = `{
products: allShopifyProduct {
edges {
node {
objectID: id
title
updated
handle
description
images {
originalSrc
}
variants {
price
}
}
}
}
}```
@u12206050 i dont have slug, so i can just change for - matchFields: ['handle', 'updatedAt']
yeah?
Yeh, that looks good.
@u12206050 hrmm not sure this is still working as intended - it seemed to work when i pushed a build from code, when the hook from changing 1 product - it seemed to take up the 800 actions again.
Hmm that is strange, I can assure you it should work though as we have been using this for months now without fail. We check a date field modified
and if/when that value changes then only that post gets updated. One thing it could be is that if you are using the url to check, make sure it doesn't change between development and production environments or just remove it from the matchFields
It could also be that you are modifying every object on build
This has been implemented in 0.8.0 as enablePartialUpdates
, thanks @u12206050 :)
A continuation of #5 and #1
What happens currently is:
We should use
atomic-algolia
oralgolia-indexing
preferably to make this into a flow where no extra index needs to be created, but rather we can index only the changes since last push, rather than all.cc @pixelastic (when you're back from parental leave of course)