Closed damithc closed 5 years ago
Ideally, it should be possible to reach such content via search results (in which case there is no need to exclude them from the search index), but probably that's too hard to do?
Ideally, it should be possible to reach such content via search results (in which case there is no need to exclude them from the search index), but probably that's too hard to do?
i.e., the page detects the target anchor is inside a hidden element and automatically triggers that element to become visible.
i.e., the page detects the target anchor is inside a hidden element and automatically triggers that element to become visible.
Probably would need a bit of digging around the codebase, but the most-likely algorithm seems to be do-able given time:
_include_.html
files), and its associated panel (if the panel is nested deeply, then it will be a list of panels starting from the root to the eventual panel with the heading).open()
method of each associated panels (from top to bottom if nested), then jump to the heading.Probably would need a bit of digging around the codebase, but the most-likely algorithm seems to be do-able given time:
Good to hear that.
- In the page's script, if the heading id matches an entry in the map, start calling the
open()
method of each associated panels (from top to bottom if nested), then jump to the heading.
Would this work for tabs too?
We also need to consider the scrolling to the target position. If not done right, scrolling could happen before opening, ending up in the wrong position. This is a problem in some existing pages already, where scrolling happens before the page has assumed its final height.
Also, we can do this in two steps.
hidden-tab
, hidden-panel
etc.@marvinchin see if we can do at least item 1 for V2. Without it, Algolia search is pretty much unusable for our main use case CS2103 website as some of the search results are unreachable by clicking on the search result.
Sure, I'll take a look at this soon!
Some preliminary thoughts:
What the user sees should be identical to what the scraper "sees". However, in our case it seems like some content hidden to the user, but is visible to the scraper and is hence indexed.
The problem seems to be that MarkBind sites are client rendered. The DOM in the original HTML contains all the content (including hidden content), before it is handled by the client side Javascript to hide them.
The DocSearch crawler, by default, assumes sites are server rendered and thus indexes everything in the original HTML. We can update the configuration to indicate the website is client rendered to have the crawler execute the client side Javascript before indexing the content. Perhaps we should include this in the documentation for the Algolia plugin after verifying that this works.
I believe this might be a way to solve the issue of hidden content being indexed without the tedium and brittleness of tagging all hidden content with a unique class.
The DocSearch crawler, by default, assumes sites are server rendered and thus indexes everything in the original HTML. We can update the configuration to indicate the website is client rendered to have the crawler execute the client side Javascript before indexing the content. Perhaps we should include this in the documentation for the Algolia plugin after verifying that this works.
Thanks for investigating @marvinchin I'll try that option to see if that gives us the intended outcome. Yes, we should include it in our documentation, if the option indeed works.
Further thoughts: Eventually, we want contents of hidden tabs (and possibly some collapsed panels) to be searchable as their content may not be repeated anywhere else in the site. But this requires the support of step 2 above, and possibly step 1 too.
Looks like Algolia doesn't like the client-rendering option https://github.com/algolia/docsearch-configs/pull/780 I also assume indexing based on client-side rendering is not exactly reliable as it is hard to predict how long a page would take to load completely?
Yes, there is some variability involved with client side rendering, unfortunately 🙁. However, that can be mitigated by setting a long enough delay.
I suppose we will need to resort to adding identifying classes to avoid this. I will investigate how this can be done automatically for vue-strap
elements.
@damithc I've updated the Algolia plugin to add the algolia-no-index
class to content that will be hidden by VueStrap
components. Unfortunately, I do not have access to any Algolia enabled sites so I am not able to test this independently.
Would it be possible to test if this works by:
.algolia-no-index
to the selectors_exclude
attribute in the DocSearch configurationThanks! 🙂
Should we make the classname more general? e.g., hidden-content
Testing this is going to be tricky though. I don't have the dev environment set up as so far I have only used the production version, and at the moment I'm stuck with an older version because of the href bug in the latest version.
I prefixed the class name with algolia
since Algolia should be the only use case for it (the functionality is also implemented in the Algolia plugin), and I was thinking of avoiding unnecessary coupling with the rest of the MarkBind behaviour. Is there any other use case where we might need to use these classes outside of Algolia?
Perhaps I could catch you for a short while tomorrow to see how we might be able to test this?
I have managed to try algolia integration with a non-trivial site, https://nus-te3201.github.io/2019/index.html
After some tweaks to the config file, with the help of the Algolia team, the search results are nicely categorized into Admin Info, SE Textbook, Programming Textbook.
The next challenge is to prevent hidden content (e.g., modals, popovers, unselected tabs, unexpanded panels etc.) from being indexed as they cannot be reached by clicking on a search result. Algolia provides a
selectors_exclude
mechanism for that but we need a way to specify those hidden content.Is there anything we can do to make it easier? e.g., add a unique class to all such content so that they can be specified in the algolia config file easily?