algolia / gatsby-plugin-algolia

A plugin to push to Algolia based on graphQl queries
https://yarn.pm/gatsby-plugin-algolia
Apache License 2.0
177 stars 45 forks source link

Deleted records not being removed #82

Closed rmcsharry closed 4 years ago

rmcsharry commented 4 years ago

First I'd like to thank you for an awesome plugin!

I have read the previous closed issues about this topic on partial updates and deleting records, but cannot solve this.

You can see it in action on this site: www.bibliotech.ca

You will need to login (just click the google sign in).

In the search box type 'asi'

You will see 7 results, including:

'ASI Canada' 'ASI Visual Display Products'

If you click the second one, it will give you a 404, clearly indicating that data no longer exists in Gatsby.

This is the algolia.js file that builds the index:

const manufacturerQuery = `{
  manufacturers: allAirtableManufacturer {
    edges {
      node {
        recordId
        data {
          Manufacturer
          KEYWORDS
          MASTER_FORMAT_CLASSIFICATION {
            data {
              Section_Name
            }
          }
          Last_update
        }
      }
    }
  }
}`

const flatten = arr => {
  if (arr)
    return arr.map(({ node }) => ({
      objectID: node.recordId,
      manufacturer: node.data.Manufacturer,
      keywords: node.data.KEYWORDS,
      classification: flattenMFC(node.data.MASTER_FORMAT_CLASSIFICATION),
      lastUpdated: node.data.Last_update
    }))
}

const flattenMFC = arr => {
  if (arr)
    return arr.map(({ data }) => ({
      sectionName: data.Section_Name,
    }))
  else return null
}

const settings = {
  attributesToSnippet: ['keywords:20'],
  attributesToHighlight: ['manufacturer', 'classification'],
  customRanking: ['asc(manufacturer)'],
}

const queries = [
  {
    query: manufacturerQuery,
    transformer: ({ data }) => flatten(data.manufacturers.edges),
    indexName: `Manufacturers`,
    settings,
  },
]

module.exports = queries

And the config:

      resolve: `gatsby-plugin-algolia`,
      options: {
        appId: process.env.GATSBY_ALGOLIA_APP_ID,
        apiKey: process.env.ALGOLIA_ADMIN_KEY,
        queries,
        chunkSize: 10000, // default: 1000
        enablePartialUpdates: true, // default: false
        matchFields: ['lastUpdated'],
      },
    },

Any insights greatly appreciated.

What is strange is that since that record was deleted from the source data (in Airtable) every time I run gatsby build it says it is deleting the record from Algolia in the build log, but it never does:

Screenshot 2020-06-21 at 19 05 19

Haroenv commented 4 years ago

Could you add logging around the deleting code, as well as checking your Algolia build logs in the dashboard to see if the delete operation was ever received? This is odd indeed

rmcsharry commented 4 years ago

@Haroenv Sorry but I don't know where the deleting code is located. I thought the plugin did this automatically.

I certainly never wrote any code to specifically delete objects. The only file I had to write was the one I posted in the issue, that builds the index from the gatsby graphql query.

I did check the dashboard in Algolia and found this: Screenshot 2020-06-26 at 22 58 00

This seems to indicate that Gatsby is trying to delete the object, but of course it no longer has the id as it is gone from Airtable.

I am guessing that possibly previously I did not have the index built correctly and it was not using the correct ObjectId. Once I changed that the 'before' and 'after' picture obviously won't work, and I think that's why I now see a POST delete with an empty ObjectId.

I am going to rebuilt the index from scratch and delete again and see if it's now working. I suspect it probably is. I will post back within the next week.

rmcsharry commented 4 years ago

@Haroenv I deleted the index, then rebuilt my gatsby project, thus causing the index to be rebuilt with the data from Airtable. 849 records, all with correct objectID's.

Then I deleted a record in Airtable and rebuilt the project. I see the exact same behaviour where it says deleting 1 object from the index and in the API log the same as I posted above, where it says the action is deleteObject but the id is undefined.

So the question then is why is the objectID undefined?

rmcsharry commented 4 years ago

@Haroenv Ok I found the deletion code and put some logging around it.

At line 145: currentIndexState is used to set isRemove to true for those objects that need deleting.

So I logged the object there:

        Object.keys(algoliaObjects).forEach(
          (o) => {
            console.log('object is', o);
            return (currentIndexState.toRemove[o.objectID] = true)
          }
        );

and it logs the correct objectID to delete.

But the actual delete operation starts at line 200, which does NOT use currentIndexState:

        if (isRemoved.length) {
      const cleanup = Object.keys(indexState).map(async function (indexName) {
        const state = indexState[indexName];
        const isRemoved = Object.keys(state.toRemove);

        if (isRemoved.length) {
          setStatus(
            activity,
            `deleting ${isRemoved.length} objects from ${indexName} index`
          );
          console.log('object is ', isRemoved)
          const { taskID } = await state.index.deleteObjects(isRemoved);
          return state.index.waitTask(taskID);
        }
      });

and here isRemoved is an array with one item: [ 'undefined' ]

Haroenv commented 4 years ago

sorry for getting back after a bit only, I was on holiday. You indeed seem to have found an interesting bug, however I haven't yet had the time to investigate its cause. I'll let you know when I do

matrix4123 commented 4 years ago

I think I am also encountering the same issue... I am using partialUpdates and seems like stale records are not being removed. Is there anything special we have to do to mark a record as deleted?

Haroenv commented 4 years ago

Thanks @thecodingwizard for looking and finding a fix, I have released it in 0.11.2

rmcsharry commented 4 years ago

@thecodingwizard and @Haroenv Thanks for the fix and the fast release of it. I can confirm that it has indeed resolved the issue! 👍