algolia / jekyll-algolia

Add fast and relevant search to your Jekyll site
https://community.algolia.com/jekyll-algolia/
MIT License
214 stars 35 forks source link

Unwanted attributes appearing in index #172

Open gazconroy opened 3 years ago

gazconroy commented 3 years ago

I want to report a bug:

What is the current behavior?

I've defined an algolia_hooks.rb in the _plugins directory and filled it with classes/attrributes that I don't want indexed. However, the dashboard tells me that they still have been.

What is your expected behavior?

None of these classes should turn up on the site's index.

Git repository to reproduce the issue:

https://github.com/gazconroy/digital-comma/tree/gh-pages

Ruby version used:

2.5

Jekyll version used:

3.9

Haroenv commented 3 years ago

is it possible that your ci hasn't run since you updated the configuration to avoid those nodes? it looks to me as if it's failing for another reason: https://travis-ci.org/github/gazconroy/digital-comma/builds/767322353

gazconroy commented 3 years ago

Cheers. Travis has never worked so have been updating the index manually from the command line as required. Any other ideas abut why the Algolia update is failing to remove those classes?

Haroenv commented 3 years ago

Unfortunately I have no further idea what could cause it. if you manually add logging to the plugin, do you see whether your method is called?

gazconroy commented 3 years ago

What flag adds logging?

I have a bit more of an insight into this, though. It has partially worked in that it has reduced the number of classes from over 140 down to 25. I suspect some of them may be 'protected'. Here are the those I attempted to remove but could not:

Some of them may well be required for Algolia to work but it would be nice to have a list of such 'protected' items.

Haroenv commented 3 years ago

objectID is required, maybe indeed trying to remove that is causing the index to no longer be consistent. Others aren't required

gazconroy commented 3 years ago

Mmm. Doesn't seem to make a difference. Perhaps those protections are within the jekyll-algolia code?

pixelastic commented 3 years ago

Hello @gazconroy,

The Algolia API does not forced any specific attribute, except the objectID. All other attributes will be generated by jekyll-algolia.

You can find here in the JSON example the base keys added by the plugin (and needed for it to correctly sort your results): https://community.algolia.com/jekyll-algolia/how-it-works.html

I don't remember if those keys are added after the hook or before, though. If they are added after you won't be able to remove them. If they are added before you can remove them, but you might then break the relevance of the plugin.

gazconroy commented 3 years ago

Thank you for the update @pixelastic . It does look like those keys are added after the hook. However, I also can't remove other keys outside that 'how it works' list:

Tthe algolia_hooks.rb code is successfully removing some unwanted keys. Just not those...

pixelastic commented 3 years ago

Wait, I had a look at your hook code and I think there might be some confusion here.

You're talking about removing keys from Algolia records, but the hook you shared seem to remove entire records based on their CSS classes. So I'm thinking we might not be talking about the same thing here.

The way the plugin works is by creating one Algolia record (the items you see in your dashboard) per HTML node (the things you're matching against in your hook). If the hook returns nil, this record is not created. If the record is created, it contains a bunch of keys (the default one I shared earlier, but you can also use the hook to add custom keys)

Does that help? If not, could you share a screenshot of what you see when you mentioned "However, the dashboard tells me that they still have been [indexed]."?

gazconroy commented 3 years ago

Sure. Here's the dashboard display of those keys.

algolia index

As you can imagine, my interpretation of this is that the plugin converts CSS classes to Algolia keys (which seems like a mighty fine idea to me).

pixelastic commented 3 years ago

@gazconroy Could you also share the frontmatter of the matching post?

gazconroy commented 3 years ago
layout: splashplace
title: Writing human-readable JavaScript for APIs
categories:
  - Javascript
header:
  overlay_color: "#000"
  overlay_filter: "0.5"
  overlay_image: assets/images/javascript.jpg
  teaser: assets/images/javascript.jpg
excerpt: URLSearchParams allows you to compose easy-to-understand API calls
gazconroy commented 3 years ago

Minimal mistakes theme with a customised layout for this post but all other content (standard minimal mistakes posts/pages) exhibit the same behaviour