baltimorecounty / BCPL-assets

Client side assets for the Baltimore County Public Library website
4 stars 0 forks source link

Google Custom Search results show results from navigation #605

Open danfox01 opened 6 years ago

danfox01 commented 6 years ago

As reported by @jdomasky:

This morning, I tried a search for “meeting room” and I received 140 results.

That high number seemed unlikely to me, so I bypassed the top results and started browsing some of the lesser results.

I’m concerned that Google might be indexing the nav menu links, such as “Meeting and Study Rooms” in the Services menu.

For example, one search result is the Summer Teen Workshops blog post from June. The content area doesn’t contain “meeting” or “room.”

Recommendation: Years ago, we faced this issue on the old BCPL website (prior to Swiftype) and I was able to adjust the indexing settings to ignore repeated links that appear on all pages, such as the content in the template. It’s been so long, I’m sure the settings have changed and the solution is different than I (barely) remember.

I have looked into this a bit and haven't found a legimate whay to prevent Google from indexing the nav. It's not a huge problem because Google still prioritizes the more relevant results over the ones with just a nav hit, but if there's anything that can be done, I'd like to know.

jdomasky commented 5 years ago

@danfox01 I did some research and uncovered two possible strategies:

1. As recently as 2017, Google CSE had a documented option that dealt with this issue:

If your pages have regions containing boilerplate content that's not relevant to the main content of the page, you can identify it using the nocontent class attribute. When Google Custom Search sees this tag, we'll ignore any keywords it contains and won't take them into account when calculating ranking for your Custom Search engine. (We'll still follow and crawl any links contained in the text marked nocontent.) To use the nocontent class attribute, include the boilerplate content in a tag (for example, span or div) like this: <div class="nocontent"> <!-- The area to exclude --> </div>

To activate this option, we can try adding this to our cse.xml: enable_nocontent_tag="true" However, because this option is no longer documented in CSE support, it may no longer work.

2. I came across advice that suggests CSE provides fewer and more accurate results if we put quotes around individual query terms. This technique seems to work for "meeting room". Compare a BCPL site search for "meeting room" with: "meeting" "room" It doesn't seem to work for "digital movies", though. With additional testing, if we determine that the accuracy of search results is generally improved by quoting individual query terms, we could consider post-processing search queries to add quotes around all terms in BCPL site searches.