Open GaurangTandon opened 6 years ago
@GaurangTandon Thank you for reporting, will investigate later when I get some time.
@GaurangTandon Hi, the reason is actually quite simple, since the search result snippet is trying to summarize a document in a short paragraph, it has to skip some content and show as many highlighted words as possible. This leads to the problem you have seen: In all the cases that you find this problem, the content that is skipped (those will be replaced by a ...
string) is in the middle of a LaTeX expression, and that is very likely to invalidate a LaTeX expression.
The current content skipping strategy is simple: Given a number of keywords in the document (within a threshold limit MAX_HIGHLIGHT_OCCURS
), pad the left and right side of each keywords, those content that are not padded will be skipped, the keywords along with their "padding" will be displayed.
The related logic is here:
https://github.com/approach0/search-engine/blob/4780e499519677433543cb92ba8baa04b56f959a/search/snippet.c#L124-L125
One way to fix this issue is not skipping any LaTeX content, but some LaTeX content are very long and this strategy will make some snippet unacceptable long. So a more smart algorithm is needed to either include complete LaTeX clip or do not include any part of that clip if it is too long.
We can leave this issue open before a better skipping strategy algorithm is implemented.
Search results page
Broken Mathjax of entry 8 copy-pasted for reference: