Open hjoliver opened 5 years ago
Appears to be a fundamental limitation: https://github.com/sphinx-doc/sphinx/issues/3301
I used it only once in Cylc documentation and it worked as I needed to find a specific token. We can switch to Algolia if a more powerful search engine is necessary. Free for non commercial use: https://www.algolia.com/pricing. And used by a bunch of Open Source projects, as well as small experiments (easier to start with Algolia before adopting something like ElasticSearch).
Just in case that's a valid alternative too :+1: looks like someone thought about adding it to Sphinx Doc too.
We could just add a Google "site:" search box for the online version (with some obvious limitations, e.g. broken for a while when new versions go up...)
Looks like Sphinx itself would need some modifications to allow use with Algolia.
(This limitation is quite surprising. Sphinx has been around for a while - surely almost anyone would want exact phrase search at the least!)
@sadielbartholomew - can we still generate a single-page version to allow in-browser Ctrl-F search, as an interim measure?
Ah yes, I see "make singlehtml" in the Makefile!
(Motivation for this issue: @jarich reports users at BOM lamenting the loss of the old single-page version, ugly as it was!).
@hjoliver, yes, that is in theory fairly easy to do with Sphinx, using the singlehtml
builder as you saw referenced. We would need to set that builder
up from within the conf.py
& then the single-page view can be built where appropriate via sphinx-build -b singlehtml
. Saying that, the docs for this builder as linked state "obviously this only works with smaller projects", suggesting there may be scaling or structural issues to do this with our significantly-sized docs.
We have, in fact, also had users here at the Met Office express desire for the single page version just for its ease & power of search. In light of this, perhaps we should escalate getting a single page HTML or even PDF version created in the build?
(This limitation is quite surprising. Sphinx has been around for a while - surely almost anyone would want exact phrase search at the least!)
Agreed! @oliver-sanders & I were also very puzzled about the poor nature of the search functionality in Sphinx, as noticed during the development of the Rose docs, given the tool is otherwise so powerful. Looking at the GitHub repo, they actually have an entire label dedicated to html search
! Hopefully this means the search feature will improve in newer versions.
Other notable issues with Sphinx in-built searching that we observed in the Rose docs, which uses the same Sphinx version as the Cylc docs at least at the moment:
.. <name>::
& title underline e.g. ==========
characters in, & also making internal notes e.g. TODO
visible by search:
... and other notable issue with the Cylc docs that we can't blame on Sphinx (& instead relate to some issue with my setup from the conversion #2910):
I think there is a Sphinx plugin for nicer searching.
PRs up to generate single-page User Guide, for Ctrl-F browser search, until this issue is resolved. #2970 #2971
@sadielbartholomew
a consistently useless search result text preview, ...
This appears to be some weird interaction with GitHub pages, because the search works fine locally for me, with the exact same generated docs.
I'm not sure what the cause is ... the Rose Docs don't have this problem on GitHub Pages (as you noted above). As an interim measure I've added a warning to the web site documentation page that search result summaries are currently broken.
I think the reason that search is broken on gh-pages is because we are only uploading the html
directory, search will require the doctree
directory as well. When you build locally this is still available?
This should go away when we copy across the rose make-docs
infrastructure into Cylc. rose make-docs
generates the following file structure:
doc -> 2019.01.01 # symlink to latest version (named doc for legacy reasons)
index.html # redirect to doc/html/index.html
2019.01.01/
index.html # redirect to html/index.html
html/
index.html
...
doctrees/
index.rst
....
slides/
index.html
...
pdf/
cylc-tutorial.pdf
rose-tutorial.pdf
rose-documentation.pdf
So http://metomi.github.io/rose/
redirects to http://metomi.github.io/rose/doc/html/index.html
.
This should go away when we copy across the rose make-docs infrastructure into Cylc
Just FYI, I'm not sure this is the plan; I think @kinow was planning to get the docs built via the setup.py
as described in https://github.com/cylc/cylc/pull/2910#issuecomment-453884314, though I could be wrong or have missed some discussion on it.
This appears to be some weird interaction with GitHub pages, because the search works fine locally for me, with the exact same generated docs.
Ah yes, I have just had a look & the search result text preview is as it should be locally, for me also. The 'Show Source' sidebar function also has always worked locally but not when setup under GitHub pages, so that may be related.
Just FYI, I'm not sure this is the plan
Copying across the rose make-docs
functionality and building via setup.py
aren't mutually exclusive.
'Show Source' sidebar function also has always worked locally
Again likely caused by stripping out the doctrees directory
I'm not stripping out the doctrees directory, as far as I'm aware. I just checked, and there is a built-sphinx/.doctrees
directory included in the gh-pages branch - presumably that's what you mean? Not sure why its different from the Rose case (doctrees
, no leading dot).
In bin/cylc-make-docs
we call sphinx-build
directly:
sphinx-build -n -b html ./src built-sphinx/
Normally you would use the standard Sphinx Makefile which would make the following call
sphinx-build -b html -d _build/doctrees . _build/html
I might be barking up the wrong tree here but I'm pretty sure that doctrees have something to do with it.
sphinx-build --help
says the -d
option value defaults to OUTPUTDIR/.doctrees
, which is what we've got. (And which isn't being stripped out).
Very little knowledge of Sphinx, no idea what's the doctrees folder for, but I think it exists in our GitHub pages: https://github.com/cylc/cylc/tree/gh-pages/doc/built-sphinx/.doctrees
Searching something like "batch", there are no JS errors on the browser console, but in the network tab, it is possible to see several 404's. Maybe that could be something contributing to the bad results layout?
Hmm, yes, interesting. The browser console is impressively useful...
This one https://github.com/cylc/cylc.github.io/pull/2 fixed how search results are displayed. But doesn't fix the main issue reported here, which is the limitations for search in Sphinx :+1:
Unfortunately the sphinx doc "search engine" is very basic: it only searches individually indexed words, there's no way to search for patterns or phrases. This means searching for any multi-word phrase that has one or more common words in it will result in loads of spurious results.
[UPDATE: what's addressed so far on this issue]