Closed u12206050 closed 5 years ago
This looks pretty great! I'd say for now this looks safe to use and let us know if this works consistently without e.g. memory problems (I think it could be possible that this object will become too big to keep in memory indefinitely like implemented now).
Yeh, I have tested it with ±1300 records without issues, since only the key:hash object is in memory. for the duration of indexing, which is around 100kb.
From a quick test I just did generating 50,000 objects then storing their key:hash generated an internal object string around 4280kb taking 143ms to save to file, I think the first issue will actually come up with the graphql results being to big but I am guessing in the hundreds of thousands then.
Great work - +1 would love to see this merged :)
@teamfphl, have you tried it out? Does it work as expected for you?
@Haroenv , tried to install this package, but always got {"proj-0":{"undefined":"8d6fdbb4e6364528d9aa1b1cc2ca49cd"}}
in algolia-index.json.
@burning-code The fact that the key is "undefined" means there probably doesn't exist an id
field on your object which is required to match the object from one build to the next. Let me know if it is impossible for you to add an id
field, I could possible add an optional option to overwrite which key to use as the ID field.
Thanks @u12206050 , it works now.
@teamfphl, have you tried it out? Does it work as expected for you?
Works for me :)
Yes I can. Will update the pull request when I am done
Ok done. Not sure if it is required to add an option for what the unique identifier field should be, but I added support for both .id and .objectID
One gotcha 🗡 I have found though is when using childImageSharp, Gatsby's image scaling plugin since this adds a new generated string it essentially changes the object's hash.
Simply not using it works though.
Maybe we can have an API to transform the object to strip those ids out before getting the hash?
Thinking about this a bit more, I think the cache should be behind a disabled flag by default, it can be called fileSystemCache: false
(unless you know a better name)
Thinking about this a bit more, I think the cache should be behind a disabled flag by default, it can be called
fileSystemCache: false
(unless you know a better name)
What about enableCaching
I've added it like so to the code
What do you think about enableCache
?
Found an issue with the cache on Netlify. It isn't being persisted as much as I would have hoped. Will be spending some time on that today so just sit tight before merging.
Doesn't work on Netlify but works locally. :( I figure when it is running on Netlify the cache object goes twice through JSON.parse, I am testing it now on Netlify, just takes sooo long to build each time :P
So after talking a bit more reading it seems there is a state of "strange-ish" issues regarding caching between builds on Netlify. Therefore I am removing the caching completely and doing a request to Algolia to get certain fields on all objects that will then be compared to see if the object has changed.
This costs only one operation per 1000 records and seems more proven to work on every static builder.
An extra option and field will be required to use for comparing properties between the indexed and new objects.
So I will be scraping this pull request.
Thanks for your effort here! Will you be making a PR for the objectID diff with the browse operation?
Stores a json file in .cache folder called algolia-index.json with the ids and hash of each object that gets synced to Algolia. On consequent updates (it the cache has not been cleared) it will only update and delete the changes instead of inserting everything.
This saves the amount of operations used in Algolia.
Supports multiple queries and different indexes.
Note: When using on Netlify, also install gatsby-plugin-netlify-cache so the cache will persist between builds.