dotnet / docfx

Static site generator for .NET API documentation.
https://dotnet.github.io/docfx/
MIT License
4k stars 849 forks source link

Full-search implementation concerns #661

Open evil-shrike opened 7 years ago

evil-shrike commented 7 years ago

First of all full-search is awesome. Really cool. But let me criticize a bit.

  1. Why do we need search-stopwords.json? lurn already contains built-in stopwords for English. But you remove default stopWordFilter, then load a separate stopwords index file and generate a filter based on it. Why? That search-stopwords.json contains the same stopwords as default builtin filter! Moreover lunr addons for languages (from https://github.com/MihaiValentin/lunr-languages) contains their own stop words.
  2. Why not build index in build-time? Why instead do you load json in run-time and then add item by item into index. It can be done (and usually done) in build time. Then in run-time we can just load an index file: $.getJSON("index.json", function (data) { engine = lunr.Index.load(data); }) That's all. I understand that you enrich search results with title and keywords which are absent in lunr.search's result. But it can be done via additional index file.
  3. no i18n Index should be built with honor of other languages. lunr natively supports only English. For additional languages support we need to add addons (from https://github.com/MihaiValentin/lunr-languages):
    in buildtime:
var lunr = require('lunr');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var lunrIdx = lunr(function() {
  this.use(lunr.multiLanguage('en', 'ru'));
  // config ref/fields
});

in runtime:

lunr.multiLanguage('en', 'ru');
engine = lunr.Index.load(data);

I can create a template for customization of index building but I think it should possible without template customization. Also please see #650 - these're problems with encoding of extracted keywords for indexing.

qinezh commented 7 years ago

Thanks for @evil-shrike 's comments, it's quite reasonable and insightful. I'm glad to share something with you:

  1. For the first question, actually, it's a way to solve the issue #279 , so users can customize the stop-words to avoid the problem.
  2. For the second one, the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need. And additional index file would make the process more complicated?
  3. For the third one, I agree with you, we should support other languages.

Thanks @evil-shrike . Feel free to share it here if you have more concerns.

evil-shrike commented 7 years ago

the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need.

sure, we have to build it - similar to that how it's built in runtime currently and only call index.toJSON at the end, and we'll have index json for Index.load. The need of additional index file would be compensated by the fact that we won't need stopwords.json (it'll used at buildtime and embeded into the generated index).

qinezh commented 7 years ago

If no stopwords.json exists, how can users customize the stop-words? For example, what if user what to search the word 'let', which is included in default lunr.js stopwords?

evil-shrike commented 7 years ago

I understand, I meant we don't need it at runtime (load a file from the server) if index would be built in built-time (with custom stopwords).

oyshan commented 6 years ago

We're experiencing isses with the second point. It seems the search index is built every time you load and/or navigate the page. This causes problems for docs sites with a medium/big-sized index.json file. It takes almost 10s for the lunr search index to be built. I.e. 10 seconds where Search does not work. I agree that the Lunr-index should be built build-time, and only loaded run-time.

nonno commented 6 years ago

I'm trying to customize the search-stopwords.json so I can filter Italian stopwords (we are writing documentation in Italian), but without success. I tried to override the file inside my custom template and also to set the array directly inside search-worker.js, but apparently nothing happens and the index.json in the _site root is always huge. Could anybody explain my how I can do it?

Shazwazza commented 6 years ago

The performance of this runtime index processing isn't so good. The doc site I have has an index.json file of about 8.5 MB. This means that search isn't available for a minute or two while it's being processed.

It's mentioned above that it might be possible to do this processing at build time instead of runtime in the browser. If that is possible, does anyone know how i can achieve that?

scionwest commented 6 years ago

I would like to know how to go about this as well. We are concerned about processing large index.json files at runtime. Any updates on this? @qinezh

scottcurrie commented 5 years ago

Checking back in on build-time indexes. Our search takes over a minute for the idex to be built on most desktops. It looks like search is just broken, because users would give up before the index is created.

Unnvaldr commented 1 year ago

If somebody is still eager for a solution, I wrote one for DocFx v2 where the index generation is moved to build-time. https://github.com/Unnvaldr/DocFx.Plugins.ExtractSearchIndex

paulushub commented 1 year ago

@Unnvaldr It is a year old now, how about a readme file explaining the features, limitations, etc, and for many, a license information?

Unnvaldr commented 1 year ago

@Unnvaldr It is a year old now, how about a readme file explaining the features, limitations, etc, and for many, a license information?

Project was private for that time, just recently decided to share it. Most of the stuff you specified will be added in the following days.

yufeih commented 11 months ago

Added a browser cache to speed up page load speed for subsequent visits. The first page view still builds the index in the browser and is slow.

Multilanguage support require building index at build time since some languages like zh contains native dependencies not available in browser.