cylc / cylc-doc

Documentation (User Guide, Cheat Sheets, etc.) for the Cylc Workflow Engine.
https://cylc.github.io/cylc-doc/
GNU General Public License v3.0
9 stars 19 forks source link

sphinx doc search limitations #7

Open hjoliver opened 5 years ago

hjoliver commented 5 years ago

Unfortunately the sphinx doc "search engine" is very basic: it only searches individually indexed words, there's no way to search for patterns or phrases. This means searching for any multi-word phrase that has one or more common words in it will result in loads of spurious results.

[UPDATE: what's addressed so far on this issue]

hjoliver commented 5 years ago

Appears to be a fundamental limitation: https://github.com/sphinx-doc/sphinx/issues/3301

kinow commented 5 years ago

I used it only once in Cylc documentation and it worked as I needed to find a specific token. We can switch to Algolia if a more powerful search engine is necessary. Free for non commercial use: https://www.algolia.com/pricing. And used by a bunch of Open Source projects, as well as small experiments (easier to start with Algolia before adopting something like ElasticSearch).

Just in case that's a valid alternative too :+1: looks like someone thought about adding it to Sphinx Doc too.

hjoliver commented 5 years ago

We could just add a Google "site:" search box for the online version (with some obvious limitations, e.g. broken for a while when new versions go up...)

hjoliver commented 5 years ago

Looks like Sphinx itself would need some modifications to allow use with Algolia.

hjoliver commented 5 years ago

(This limitation is quite surprising. Sphinx has been around for a while - surely almost anyone would want exact phrase search at the least!)

hjoliver commented 5 years ago

@sadielbartholomew - can we still generate a single-page version to allow in-browser Ctrl-F search, as an interim measure?

hjoliver commented 5 years ago

Ah yes, I see "make singlehtml" in the Makefile!

hjoliver commented 5 years ago

(Motivation for this issue: @jarich reports users at BOM lamenting the loss of the old single-page version, ugly as it was!).

sadielbartholomew commented 5 years ago

@hjoliver, yes, that is in theory fairly easy to do with Sphinx, using the singlehtml builder as you saw referenced. We would need to set that builder up from within the conf.py & then the single-page view can be built where appropriate via sphinx-build -b singlehtml. Saying that, the docs for this builder as linked state "obviously this only works with smaller projects", suggesting there may be scaling or structural issues to do this with our significantly-sized docs.

We have, in fact, also had users here at the Met Office express desire for the single page version just for its ease & power of search. In light of this, perhaps we should escalate getting a single page HTML or even PDF version created in the build?

sadielbartholomew commented 5 years ago

(This limitation is quite surprising. Sphinx has been around for a while - surely almost anyone would want exact phrase search at the least!)

Agreed! @oliver-sanders & I were also very puzzled about the poor nature of the search functionality in Sphinx, as noticed during the development of the Rose docs, given the tool is otherwise so powerful. Looking at the GitHub repo, they actually have an entire label dedicated to html search! Hopefully this means the search feature will improve in newer versions.

Other notable issues with Sphinx in-built searching that we observed in the Rose docs, which uses the same Sphinx version as the Cylc docs at least at the moment:

sadielbartholomew commented 5 years ago

... and other notable issue with the Cylc docs that we can't blame on Sphinx (& instead relate to some issue with my setup from the conversion #2910):

oliver-sanders commented 5 years ago

I think there is a Sphinx plugin for nicer searching.

hjoliver commented 5 years ago

PRs up to generate single-page User Guide, for Ctrl-F browser search, until this issue is resolved. #2970 #2971

hjoliver commented 5 years ago

@sadielbartholomew

a consistently useless search result text preview, ...

This appears to be some weird interaction with GitHub pages, because the search works fine locally for me, with the exact same generated docs.

I'm not sure what the cause is ... the Rose Docs don't have this problem on GitHub Pages (as you noted above). As an interim measure I've added a warning to the web site documentation page that search result summaries are currently broken.

oliver-sanders commented 5 years ago

I think the reason that search is broken on gh-pages is because we are only uploading the html directory, search will require the doctree directory as well. When you build locally this is still available?

This should go away when we copy across the rose make-docs infrastructure into Cylc. rose make-docs generates the following file structure:

doc -> 2019.01.01  # symlink to latest version (named doc for legacy reasons)
index.html         # redirect to doc/html/index.html
2019.01.01/
    index.html     # redirect to html/index.html
    html/
         index.html
         ...
     doctrees/
         index.rst
         ....
     slides/
         index.html
         ...
      pdf/
          cylc-tutorial.pdf
          rose-tutorial.pdf
          rose-documentation.pdf

So http://metomi.github.io/rose/ redirects to http://metomi.github.io/rose/doc/html/index.html.

sadielbartholomew commented 5 years ago

This should go away when we copy across the rose make-docs infrastructure into Cylc

Just FYI, I'm not sure this is the plan; I think @kinow was planning to get the docs built via the setup.py as described in https://github.com/cylc/cylc/pull/2910#issuecomment-453884314, though I could be wrong or have missed some discussion on it.

sadielbartholomew commented 5 years ago

This appears to be some weird interaction with GitHub pages, because the search works fine locally for me, with the exact same generated docs.

Ah yes, I have just had a look & the search result text preview is as it should be locally, for me also. The 'Show Source' sidebar function also has always worked locally but not when setup under GitHub pages, so that may be related.

oliver-sanders commented 5 years ago

Just FYI, I'm not sure this is the plan

Copying across the rose make-docs functionality and building via setup.py aren't mutually exclusive.

'Show Source' sidebar function also has always worked locally

Again likely caused by stripping out the doctrees directory

hjoliver commented 5 years ago

I'm not stripping out the doctrees directory, as far as I'm aware. I just checked, and there is a built-sphinx/.doctrees directory included in the gh-pages branch - presumably that's what you mean? Not sure why its different from the Rose case (doctrees, no leading dot).

oliver-sanders commented 5 years ago

In bin/cylc-make-docs we call sphinx-build directly:

sphinx-build -n -b html ./src built-sphinx/

Normally you would use the standard Sphinx Makefile which would make the following call

sphinx-build -b html -d _build/doctrees   . _build/html

I might be barking up the wrong tree here but I'm pretty sure that doctrees have something to do with it.

hjoliver commented 5 years ago

sphinx-build --help says the -d option value defaults to OUTPUTDIR/.doctrees, which is what we've got. (And which isn't being stripped out).

kinow commented 5 years ago

Very little knowledge of Sphinx, no idea what's the doctrees folder for, but I think it exists in our GitHub pages: https://github.com/cylc/cylc/tree/gh-pages/doc/built-sphinx/.doctrees

kinow commented 5 years ago

Searching something like "batch", there are no JS errors on the browser console, but in the network tab, it is possible to see several 404's. Maybe that could be something contributing to the bad results layout?

image

hjoliver commented 5 years ago

Hmm, yes, interesting. The browser console is impressively useful...

kinow commented 5 years ago

This one https://github.com/cylc/cylc.github.io/pull/2 fixed how search results are displayed. But doesn't fix the main issue reported here, which is the limitations for search in Sphinx :+1: