algolia / gatsby-plugin-algolia

A plugin to push to Algolia based on graphQl queries
https://yarn.pm/gatsby-plugin-algolia
Apache License 2.0
177 stars 45 forks source link

Unexpected non-partialUpdate behavior #101

Closed prichey closed 3 years ago

prichey commented 3 years ago

As I'm working through a PR that addresses some potential issues with the enablePartialUpdate behavior (i.e.), I disabled that option and have been getting some strange behavior. I'm running my new update, so hopefully these issues weren't introduced by my PR but I'm not ruling it out 😬.

With enablePartialUpdate disabled (and concurrentQueries enabled), it seems as though queries with the same index end up overwriting each other, such that the last query with the corresponding index is the one actually reflected in the index after building. I think this happens here, whenever you move a (fresh) temporary index into the intended index.

My expected behavior would be that the results of the query would be appended to an index (updating the indices that exist), but wouldn't necessarily delete all other entries not in the corresponding query. @Haroenv Can you give me an idea of whether or not this behavior (overwriting the index as a whole, not preserving any entries outside of the current query) is expected or is a bug?

Thanks!

Haroenv commented 3 years ago

It's expected to delete other items in the index if partial updates are disabled yes, that does mean that you won't be able to index multiple times into the same index from different queries, but that likely wouldn't be needed, since you could consolidate those queries into a single one.

I think if I make a major version at some point, I might keep only the partial update mode, since it has way less operations

prichey commented 3 years ago

Hmm. In my case it's useful to use separate queries on the same index because I build them up programmatically, e.g.:

{
  resolve: `gatsby-plugin-algolia`,
  options: {
    appId: process.env.GATSBY_ALGOLIA_APP_ID,
    apiKey: process.env.ALGOLIA_ADMIN_API_KEY,
    indexName: process.env.GATSBY_ALGOLIA_INDEX_NAME,
    queries: [
      ...getQueries('foo'),
      ...getQueries('bar'),
    ],
  },
}

It seems unintuitive to allow different queries to the same index but end up overwriting all but the last. Would you entertain a PR to potentially batch queries by index, combining the queries before moving them? I can't immediately think of any downsides and it seems like it'd be useful in cases like mine where you do want to have multiple queries on the same index. Let me know what you think, thanks

Haroenv commented 3 years ago

If you think that's a valid use case, I'll gladly accept a PR for batching the indexing if it makes sense. I still don't really get why you wouldn't combine both in one query, but I see how it's confusing that only one query is taken in account

Alternatively we can also throw an error when multiple queries have the same index partial updates disabled?

prichey commented 3 years ago

Sounds good, I'll go ahead and work on something to see how tricky it'll be to batch

Another reason to allow multiple queries to the same index might be that you want different matchFields per query, which if you force one query per index wouldn't be possible.

Another related question, to make sure I don't break functionality: can you tell me the purpose of the toRemove object with partial updates? As I understand it, if you call index.addObjects, those objects are overwritten if they exist on the index currently, so 'cleaning up' those objects after the fact only results in valid (i.e. updated) records being removed from the index.

Thanks for the quick responses, I realize different time zones makes collaborating a bit more difficult

Haroenv commented 3 years ago

the toRemove objects is for partialUpdate enabled, to decide which updates were in the previous index, but should no longer be there, meaning the presence of matchFields for the objects in the index, and if it's there, it will get deleted or updated

prichey commented 3 years ago

Alright, that makes sense. I think initially I was confused about the need to clear out the index only for partial updates but I see that you start with a fresh index if partial updates are disabled so removing the old entries is not necessary.

I am curious, from a philosophical standpoint, how matchFields are intended to work. Is it intended to simply be a field that is checked for, and if it has an extant (truthy) value (e.g. modified: true), then the object is updated? Or are the matchFields intended to be used as a comparison between the new value and the old value, and if the value has changed (e.g. modified: new Date()), then the object is updated?

By my understanding, the code currently uses the prior notion to check for objects to remove (here) but the latter notion to check for objects to update (here, which can lead to inconsistent results

Haroenv commented 3 years ago

If an object needs to be removed, we can't check if it's supposed to change or not, since in the new objects, it doesn't exist. That's why we just check for presence. For update we only want to update items which aren't the same as an already existing item. Hope that makes sense!

prichey commented 3 years ago

Closing as this is fixed!