Added object hash map to check what to update

algolia / gatsby-plugin-algolia

A plugin to push to Algolia based on graphQl queries

https://yarn.pm/gatsby-plugin-algolia

Apache License 2.0

178 stars 45 forks source link

Added object hash map to check what to update #26

Closed u12206050 closed 5 years ago

u12206050 commented 5 years ago

Stores a json file in .cache folder called algolia-index.json with the ids and hash of each object that gets synced to Algolia. On consequent updates (it the cache has not been cleared) it will only update and delete the changes instead of inserting everything.

This saves the amount of operations used in Algolia.

Supports multiple queries and different indexes.

Note: When using on Netlify, also install gatsby-plugin-netlify-cache so the cache will persist between builds.

Haroenv commented 5 years ago

This looks pretty great! I'd say for now this looks safe to use and let us know if this works consistently without e.g. memory problems (I think it could be possible that this object will become too big to keep in memory indefinitely like implemented now).

u12206050 commented 5 years ago

Yeh, I have tested it with ±1300 records without issues, since only the key:hash object is in memory. for the duration of indexing, which is around 100kb.

From a quick test I just did generating 50,000 objects then storing their key:hash generated an internal object string around 4280kb taking 143ms to save to file, I think the first issue will actually come up with the graphql results being to big but I am guessing in the hundreds of thousands then.

teamfphl commented 5 years ago

Great work - +1 would love to see this merged :)

Haroenv commented 5 years ago

@teamfphl, have you tried it out? Does it work as expected for you?

burning-code commented 5 years ago

@Haroenv , tried to install this package, but always got {"proj-0":{"undefined":"8d6fdbb4e6364528d9aa1b1cc2ca49cd"}} in algolia-index.json.

u12206050 commented 5 years ago

@burning-code The fact that the key is "undefined" means there probably doesn't exist an id field on your object which is required to match the object from one build to the next. Let me know if it is impossible for you to add an id field, I could possible add an optional option to overwrite which key to use as the ID field.

burning-code commented 5 years ago

Thanks @u12206050 , it works now.

teamfphl commented 5 years ago

@teamfphl, have you tried it out? Does it work as expected for you?

Works for me :)

u12206050 commented 5 years ago

Yes I can. Will update the pull request when I am done

u12206050 commented 5 years ago

Ok done. Not sure if it is required to add an option for what the unique identifier field should be, but I added support for both .id and .objectID

One gotcha 🗡 I have found though is when using childImageSharp, Gatsby's image scaling plugin since this adds a new generated string it essentially changes the object's hash.

Simply not using it works though.

Haroenv commented 5 years ago

Maybe we can have an API to transform the object to strip those ids out before getting the hash?

Haroenv commented 5 years ago

Thinking about this a bit more, I think the cache should be behind a disabled flag by default, it can be called fileSystemCache: false (unless you know a better name)

u12206050 commented 5 years ago

Thinking about this a bit more, I think the cache should be behind a disabled flag by default, it can be called fileSystemCache: false (unless you know a better name)

What about enableCaching I've added it like so to the code

Haroenv commented 5 years ago

What do you think about enableCache?

u12206050 commented 5 years ago

Found an issue with the cache on Netlify. It isn't being persisted as much as I would have hoped. Will be spending some time on that today so just sit tight before merging.

u12206050 commented 5 years ago

Doesn't work on Netlify but works locally. :( I figure when it is running on Netlify the cache object goes twice through JSON.parse, I am testing it now on Netlify, just takes sooo long to build each time :P

u12206050 commented 5 years ago

So after talking a bit more reading it seems there is a state of "strange-ish" issues regarding caching between builds on Netlify. Therefore I am removing the caching completely and doing a request to Algolia to get certain fields on all objects that will then be compared to see if the object has changed.

This costs only one operation per 1000 records and seems more proven to work on every static builder.

An extra option and field will be required to use for comparing properties between the indexed and new objects.

So I will be scraping this pull request.

Haroenv commented 5 years ago

Thanks for your effort here! Will you be making a PR for the objectID diff with the browse operation?