Epinova / Epinova.Elasticsearch

A search-plugin for Episerver CMS and Commerce
MIT License
29 stars 20 forks source link

Highlight returns broken HTML tags #129

Closed camlcase closed 3 years ago

camlcase commented 3 years ago

The Elasticsearch highlight built-in function used in search plugin sometimes returns broken HTML for XhtmlString indexed properties (like MainBody). I´ve seen discussions about this "issue" and one possible solution has been the HTML strip character filter while analyzing.

Anyway is there something we can do in this case, with the plugin, to solve this?

Episerver CMS version 11.20.0.0 Epinova.Elasticsearch version 11.7.3.139 Elasticsearch version 7.9.3

camlcase commented 3 years ago

We´ve have managed to solve this by copying MainBody into a new string property and decode and strip the HTML. For the MainBody the indexing has been disabled.

[Searchable]
public string MainBodySearchable => (this["MainBody"] as XhtmlString).StripHtml();

Like the Episerver Search & Navigation, may I suggest an attribute to decode and strip HTML:

[RemoveHtmlTagsWhenIndexing]
public virtual XhtmlString MainBody { get; set; }

or by convention:

Indexing.Instance
    .ForType<MyPage>().StripHtml(x => x.MainBody)