Open craigfowler opened 1 week ago
Just as a point of reference, what is the size of the _site/docfx.json
_site/index.json
file?
I thought current docfx lurn.js
based search backend have several issues.
index.json
is larger than 5MB)index.json
on first page load timing. It consume CPU times.I've tested before to switch search engine from lunr.js
to Pagefind before.
And it seems almost works without problems.
(It requires additional tasks thought (e.g. handle <mark>
tags. and supporting UI customization))
Demo Site https://filzrev.github.io/docfx.samples.pagefind/
Custom template to change search backend to pagefind https://github.com/filzrev/docfx.samples.pagefind/tree/main/docs/templates/pagefind
The advantages of using Pagefind include.
npx pagefind --site _site
command)debounced
search by default. Just as a point of reference, what is the size of the _site/docfx.json file?
Are you sure you meant _site/docfx.json
and not something else? _site/manifest.json
perhaps?
There is no docfx.json
file in _site
(and I checked, we're using v2.76 to build our docco, so not like we're on an outdated version). Our _site/manifest.json
is around 4.5MB in size. We've got a little over 13,700 documents in _site/api
where our C# type documentation builds.
I'll see if I can find some time to try out that pagefind-based search template. Perhaps if it's "just generally superior" to Lunr without any downsides then perhaps DocFx could adopt it officially as a default.
That said, for at least one concern listed:
Search index is dynamically created from docfx.json on first page load timing. It consume CPU times.
Apparently this can already be solved in Lunr, it just needs to be configured.
Sorry for the confusion.
I'm originally intended to indicate _site/index.json
file. (I've modified above comments).
Hmm, we don't have an index.json
in the root of _site
either.
The only files which are generated into the root of _site
which aren't whole HTML pages or obvious non-logic assets (like favicons) are:
manifest.json
(approx 4.5Mb)toc.json
(2Kb)xrefmap.yml
(huge, approx 72Mb)I've had a look at some of the subfolders and there's no index.json
(or any other JSON files at all) in any of those either. We're using pretty vanilla DocFx, although I think a while ago we switched to the bundled modern template in order to activate Mermaid diagram syntax. I think that might not have been the default template when we first installed DocFx into the project.
I've tried default
/modern
templates.
And in both cases. index.json
file is generated when running docfx build
command .
This file is generated by ExtractSearchIndex
PostProcessor (That is automatically added when _enableSearch: true
)
And without this index.json
file. lurn.js
based search is not works (As far as I knows)
Is it able to test following steps?
docfx init --yes
command.docfx build
command._site/index.json
file is generated or not.Interesting, on a blank project it is generated but on our main solution's docco project it does not.
I compared the docfx.json
files from our solution and a freshly created project and there are some structural differences, including in areas that we have never edited. We would have generated an empty config file way back when we started using docfx and never did anything with those areas. For example our main solution has the following sections explicitly declared with empty arrays. A freshly-generated docfx config doesn't include these sections at all.
globalMetadataFiles
fileMetadataFiles
postProcessors
<-- I guess this is why the ExtractSearchIndex
post-processor isn't running for usI suspect that this is because our config file was generated from an older version of docfx. I guess the default behaviour changed over time, we upgraded versions and left our config file unchanged. If there was a docco/release note saying "Please review/regenerate your config because things changed" then we didn't spot it.
Anyway, I'll review that docfx.json
file today and try to make it (structurally) look a little more like a current fresh one. Then I'll put a docco build through CI and see what it comes up with.
@filzrev I have updated our config, thanks for leading me to discover that it was outdated/malformed. Our docco site has done a full CI and re-published internally. An index.json
has appeared in the root of _site
and it is 32.5Mb large.
Thanks for your confirmation.
I'm also tried to reproduce problems on local environment.
By using index.json
file about 64.8 MB.
And the following results were obtained.
index.json
and save json to indexedDB)search-worker.min.js
consuming memory about 800MB.
Is your feature request related to a problem? Please describe.
I use DocFX to generate the docs for a very large solution with a huge API. I like the API search feature in the template but when typing a search string, due to the size of the searched code, this introduces high web browser CPU usage and freezes my browser UI for a few seconds. I appreciate that it's probably not possible to simply make the search quicker but I noticed that the search seems to begin the very moment I type the first character of my search string. That seems to be wasting compute resources performing a search which is going to be discarded. It also means I can't see the rest of characters I'm typing into the search text box until the search has completed. Sometimes that means I have to use backspace and correct typos, triggering more wasteful search operations.
Describe the solution you'd like
I'd like a template option, likely alongside
_enableSearch
which is of type nullable integer. Ifnull
then the behaviour is as the template currently functions; the client side logic begins searching as soon as a character is received in the textbox. If not null and a positive integer is specified then this is the number of milliseconds that the input is debounced before a search begins. Obviously a negative integer here is nonsense; it should either be treated the same as null or should raise an error, as appropriate to DocFx's conventions.A default value of
null
seems sensible, to maintain the current behaviour, for small-to-medium sized solutions. In my solution, I would try an initial debounce-timer of around 350ms. That seems long enough that someone who is typing - and knows what they are typing - can likely type it all without triggering wasteful searches. It should be short enough though, that they aren't frustrated waiting for it.Describe alternatives you've considered
In truly large projects I imagine that the entire search functionality should be moved server-side with a completely custom impl that is outside the scope of DocFX. We don't have the capacity/motivation to do that.
I suppose I could work around the problem by copy-pasting my search term into the search text box, but that is not as convenient.
Additional context
For reference, this is the API search tool I'm referring to.