Search API indexing fails in parsing some texts

NEUBIAS / bise

BIII development repository

http://biii.eu

GNU General Public License v2.0

5 stars 14 forks source link

Search API indexing fails in parsing some texts #169

Closed miura closed 4 years ago

miura commented 4 years ago

While indexing for search API, there are many warnings such as

> [warning] An overlong word (more than 50 characters) was encountered while indexing: sampledriftcorrectionfollowing4dconfocaltimelapseimaging.<br />Since database search servers currently cannot index words of more than 50 characters, the word was truncated for indexing. If this should not be a single word, please make sure the "Tokenizer" processor is enabled and configured correctly for index Default content index.

It seems that HTML paths are also indexed and causing such index keys that are very long and unusable.

miura commented 4 years ago

The Stopwords processor should come after the Tokenizer. WIth this processing order, there is no failure in parsing URL paths.

I will push this soon.

PerrineGilloteaux commented 4 years ago

Great thanks . Kota , Chong suggestion is a good one but what do you mean by we can not add content anymore ? Why can we put this version live and let people now adding things now that is is correctly merge with this data base ?

Le 24 juin 2020 à 16:15, Kota Miura notifications@github.com a écrit :

The Stopwords processor should come after the Tokenizer. WIth this processing order, there is no failure in parsing URL paths.

I will push this soon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

miura commented 4 years ago

deployed. for this issue, see: 67a03d9ad41b6be25d44f40354b9fc1fa0174c47 and f92b7f0a4670aee5ff4105a2b51b926aa948ad9b