algolia / gatsby-plugin-algolia

A plugin to push to Algolia based on graphQl queries
https://yarn.pm/gatsby-plugin-algolia
Apache License 2.0
177 stars 45 forks source link

gatsby-plugin-algolia is deleting my entire index before writing to it #93

Closed miketheodorou closed 3 years ago

miketheodorou commented 4 years ago

I am writing to the same index from two different sources. One of which is using the 'algoliasearch' package in a node backend, and the other, using the gatsby-plugin-algolia to read and write from strapi. The objects on the backend are being added first, then after the gatsby build is complete, the plugin runs and deletes everything. I'm wondering if there is a way to use both of these methods to write to the same index without them colliding.

Haroenv commented 4 years ago

Hi @miketheo423, I've published 0.12.0 which solves this use case. However it requires you do the following:

  1. use enablePartialUpdates
  2. make sure the external objects do not have any of your matchFields
  3. verify that external objects don't get deleted
miketheodorou commented 4 years ago

Hey @Haroenv , I am using the enablePartialUpdates flag and and passing in a field that I know the other objects definitely do not have but it still seems to blow everything out

Haroenv commented 4 years ago

Reopening to investigate

miketheodorou commented 4 years ago

Here's an example of what i'm attempting to pass in now.

algolia-queries.js

const eventsQuery = `{
  allStrapiEventsPage {
    edges {
      node {
        events {
          objectID: id
          title
          description
          availability
          timezone
          rating
          genres
          published
          image {
            publicURL
          }
        }
      }
    }
  }
}`;

const eventsReducer = ({ data }) => {
  return data.allStrapiEventsPage.edges.reduce((acc, { node }) => {
    acc = [...acc, ...node.events];
    return acc;
  }, []);
};

const queries = [
  {
    query: eventsQuery,
    transformer: eventsReducer,
    matchFields: ['publicURL'],
  },
];

module.exports = queries;

gatsby-config.js

...,
{
      resolve: `gatsby-plugin-algolia`,
      options: {
        appId: process.env.ALGOLIA_APP_ID,
        apiKey: process.env.ALGOLIA_ADMIN_KEY,
        indexName: process.env.ALGOLIA_INDEX_NAME,
        enablePartialUpdates: true,
        matchFields: ['publicURL'],
        queries: require('./src/utils/algolia-queries'),
      },
    },
...

This is what happened when I ran my build:

Algolia: 1 queries to index
Algolia: query #1: executing query
Algolia: query 0: graphql resulted in 1 records
Algolia: query 0: starting Partial updates
Algolia: query 0: found 1511 existing records
Algolia: query 0: Partial updates – [insert/update: 0, total: 1]
Algolia: query 0: splitting in 0 jobs
Algolia: deleting 1510 objects from prod_SOD index
⠴ onPostBuild
miketheodorou commented 4 years ago

@Haroenv Does that need to be matchFields: ['image.publicURL'] instead?

UPDATE: Using that field above did not make a difference.

Haroenv commented 4 years ago

ah, I think that the plugin isn't yet written to allow dots in matchFields. Can you try it out with a top-level attribute first? Then we can add the feature of dots in the attribute

miketheodorou commented 3 years ago

@Haroenv Yeah it looks like the top-level attribute is yielding the same result unfortunately.

Haroenv commented 3 years ago

Do you have a reproduction? With a top-level attribute that only exists in the Gatsby index I don't see an issue

JesusFdezDav commented 3 years ago

Hi @Haroenv, We are having the same problem as @miketheo423 but, by doing what you suggested, we would only be checking for updates in the objects from one of the two sources. Is that correct?

Haroenv commented 3 years ago

I'm not sure what you mean. Could you make a reproduction or a script that makes this index + Gatsby configuration which removes the index? I've tried this multiple times, and as long as the Gatsby index has an attribute on top-level which is used for matchFields, which the other records don't have, I see no issues...

prichey commented 3 years ago

After getting this issue myself I think I have an idea why this is happening, which I think is just a misunderstanding of how matchFields should be used.

In the source, you check if any of the matchFields have a truthy value in the fresh algoliaObjects object here:

Object.keys(algoliaObjects).forEach(objectID => {
    // if the object has one of the matchFields, it should be removed,
    // but objects without matchFields are considered "not controlled"
    // and stay in the index
    if (matchFields.some(field => algoliaObjects[objectID][field])) {
      currentIndexState.toRemove[objectID] = true;
    }
  });
}

While this may work for the boolean flag on a modified field, it doesn't work in the use case where you want to update a field iff the field value has changed. In @miketheo423 's example above, they're using publicUrl for the matchField, which if it ends up being truthy (i.e. a non-empty string), will satisfy the matchFields.some check above and therefore be removed.

Before digging into the source, I made the same assumption about how the plugin works. (I actually also made the same assumption about the object.key pattern.)

@Haroenv what would you think about adding a predicate function to the plugin config (and maybe even per query?) which takes an object representing the previous value and returns true / false based on whether or not the object should be updated in the index? Passing a function rather than an array of strings would allow both for the modified behavior I believe matchFields is built around but also can accommodate more complex examples?

Haroenv commented 3 years ago

I think that makes sense @prichey. If it makes more sense, since this plugin is still in 0.x, if you find a more clean way to express the API, don't hesitate to make breaking changes. Thanks!

prichey commented 3 years ago

Sounds good, I'll work on some changes then PR.

I'm actually also interested in making some changes to add a disableConcurrentAccess option to the plugin as an attempt to fix https://github.com/algolia/gatsby-plugin-algolia/issues/20. All of my queries necessarily deal with the same index so I'm thinking that indexing sequentially rather than concurrently might fix the instances when my builds hang due to Algolia tasks getting stalled.

That being the case, @Haroenv would you prefer I make 2 separate PR's or are you fine with accepting one that addresses both issues?

Haroenv commented 3 years ago

separate PRs will be easier to review, thanks @prichey !

prichey commented 3 years ago

@miketheo423 Have you tried updating to the most recent version? This should be fixed now

Haroenv commented 3 years ago

Let's assume it's fixed :) If not, please open a new issue with reproduction