apache / pekko

Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala
https://pekko.apache.org/
Apache License 2.0
1.22k stars 150 forks source link

super-high cost when searching for documents #1097

Open Roiocam opened 9 months ago

Roiocam commented 9 months ago

Reproduce

Just search from https://pekko.apache.org/docs/pekko/current//index.html

Happened on:

I think this maybe a issue from paradox.

Self-diagnosis

Just a quick investigation from myself, the profiler shows that most of the time costs from a reduce function.

I am not a frontend developer, can not deep dive to it.

截屏2024-02-04 00 43 51 截屏2024-02-04 00 49 34 截屏2024-02-04 00 50 09

Performance Profile

This is my profiler result.

Trace-20240204T004355.json

Roiocam commented 9 months ago

After some digging, I just found the issue caused by a very large search_index.json, it will load when the user clicks the search input box.

截屏2024-02-04 02 34 49

This javascript file comes from the upstream repo: https://github.com/squidfunk/mkdocs-material, in the latest documentation website of them, it will pre-loaded search index, and the index file size is only 200kb.

Then I checked the content of pekko search index file, which is very large json array.

截屏2024-02-04 02 48 11

Finally, I decided to look up the answer in the upstream repo, and I found this: https://github.com/squidfunk/mkdocs-material/issues/904

pjfanning commented 9 months ago

It is Paradox that builds this json file. I'm not sure what we can do in the short term. Maybe, we should look at trying to offload the search to Google instead of using our own search with its own JSON file.

See https://poi.apache.org/ - another Apache project site - its search is Google based.

pjfanning commented 9 months ago

An example Google search

actor site:pekko.apache.org

https://www.google.com/search?q=actor+site%3Apekko.apache.org

He-Pin commented 9 months ago

IIRC, it was using the algolia for indexing and searching

Roiocam commented 9 months ago

IIRC, it was using the algolia for indexing and searching

I don't think so, only akka uses algolia, we use paradox-material-theme which depends on mkdocs-material

Maybe, we should look at trying to offload the search to Google instead of using our own search with its own JSON file.

I am using Docusaurus at work, they use the same way to implement offline search but won't stuck UI thread.

Roiocam commented 9 months ago

I will try to handle this issue via an update paradox-material-theme.

mdedetrich commented 9 months ago

I will try to handle this issue via an update paradox-material-theme.

Note that sbt-paradox-material-theme was just transferred to the sbt org/community and we are currently in the process of making the necessary changes so it may take a bit of time before we can get to deploying the change

Roiocam commented 9 months ago

I will try to handle this issue via an update paradox-material-theme.

I tried to upgrade to the latest mkdocs-material and found that the way it was built changed after the 5.x version, and only up-to 5.x version can avoid search blocking the main thread, and it seems that the implementation is through the need to precompile a search index file.

I think this upgrade is no less difficult than rewriting upstream https://github.com/sbt/sbt-paradox-material-theme. We should consider replacing the search implementation to solve this ISSUE.

squidfunk commented 9 months ago

Author of Material for MkDocs here. v5.x is from 2020, so pretty old. We made significant improvements on search in 9.x, which should be twice as fast and significantly cut down on index size, and will be replacing it with an entirely new implementation that will be much faster and more powerful in the near future. Related:

mdedetrich commented 9 months ago

@Roiocam sbt-paradox-theme has been ported over to the sbt org/package and 0.7.0 has just been published so you are now free to make changes against sbt-paradox-theme.