Closed miura closed 4 years ago
The Stopwords processor should come after the Tokenizer. WIth this processing order, there is no failure in parsing URL paths.
I will push this soon.
Great thanks . Kota , Chong suggestion is a good one but what do you mean by we can not add content anymore ? Why can we put this version live and let people now adding things now that is is correctly merge with this data base ?
Le 24 juin 2020 à 16:15, Kota Miura notifications@github.com a écrit :
The Stopwords processor should come after the Tokenizer. WIth this processing order, there is no failure in parsing URL paths.
I will push this soon.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
deployed. for this issue, see: 67a03d9ad41b6be25d44f40354b9fc1fa0174c47 and f92b7f0a4670aee5ff4105a2b51b926aa948ad9b
While indexing for search API, there are many warnings such as
> [warning] An overlong word (more than 50 characters) was encountered while indexing: sampledriftcorrectionfollowing4dconfocaltimelapseimaging.<br />Since database search servers currently cannot index words of more than 50 characters, the word was truncated for indexing. If this should not be a single word, please make sure the "Tokenizer" processor is enabled and configured correctly for index Default content index.
It seems that HTML paths are also indexed and causing such index keys that are very long and unusable.