When indexing content in a content area, it seems to only gather content from the master language

AThraen commented 1 year ago

In a multi-language site (on CMS 11), where pages and blocks exist in multiple languages, I've configured it to include a ContentArea property on all pages which contains a lot of blocks with the actual text on the site. It looks like it only takes the contents of these blocks from the master language version of them - and not the other language versions.

I'm guessing that this is because it's using ContentAreaItem.Get() to get the content for each element in a content area, and this in turns uses the ContentLoader to fetch the content. The content loader will default to the current context culture to figure out which language version to load if it's not specified - however since the indexing happens in a scheduled job, the context won't matter and it will resort to master language. But that's just a guess. If this is indeed the problem I would suggest an alternate approach where the proper language is resolved directly instead of relying on ContentAreaItem.Get().

otanum commented 1 year ago

Could you please verify package version 11.8.8-Release-198-contentarea-language-bug?

AThraen commented 1 year ago

I can verify that it now does seem to index the correct language content for blocks in content areas, however - there is a new problem. It indexes the HTML encoded content into the parent pages MainContent. Here is an example in Icelandic: "MainContent": "Á þremur vefkynningum færðu innsýn í nýjustu uppfærsluna á skýþjónustu "

otanum commented 1 year ago

My changes are only related to the handling of ContentArea. Are you sure that nothing in your code indexes the HTML content. If not, is it possible to give us a little bit more details?

AThraen commented 1 year ago

I see now that my comment above got correctly encoded by github :-) My point was that in the indexing of the blocks in the content area, I'm guessing there is a .ToWebString() or something that html encodes special characters - so they end up in the index as & .... ; instead of the UTF-8 character, making them impossible to search for.

otanum commented 1 year ago

Try 11.8.21-Release-198-contentarea-language-bug

AThraen commented 1 year ago

That seems to work, thanks!

otanum commented 1 year ago

11.8.26

Epinova / Epinova.Elasticsearch

When indexing content in a content area, it seems to only gather content from the master language #198