Closed MikeAlhayek closed 1 week ago
Let me see.. This pr will disable adding custom token filters, but in the version in main branch that perfectly works.
"OrchardCore_Elasticsearch": {
"ConnectionType": "SingleNodeConnectionPool",
"Url": "http://localhost",
"Ports": [ 9200 ],
"Analyzers": {
"default": {
"type": "standard",
"filter": [
"lowercase",
"english_stop",
"english_stemmer"
]
},
"custom_analyzer": {
"type": "standard",
"filter": [
"lowercase",
"english_stop",
"english_stemmer",
"english_synonyms"
]
}
},
"Filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_synonyms": {
"type": "synonym_graph",
"synonyms": ["page,cage"]
}
}
}
In this pr version analyzer is created without token filters if specified like in configuration above.
if to use your configuration like in https://github.com/OrchardCMS/OrchardCore/issues/17012 yeah - it works, but if I only add my custom token filter - elastic index is not created
"OrchardCore_Elasticsearch": {
"ConnectionType": "SingleNodeConnectionPool",
"Url": "http://localhost",
"Ports": [ 9200 ],
"Analyzers": {
"standard": {
"type": "standard"
},
"keyword": {
"type": "keyword"
},
"whitespace": {
"type": "whitespace"
},
"knowledgebase_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"english_synonyms"
],
"char_filter": [
"html_strip"
]
}
},
"Filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_synonyms": {
"type": "synonym_graph",
"synonyms": [ "page,cage" ]
}
}
}
I need to understand what exactly no longer work, because everything in main branch seems to work for me.
Anyway we need to be able to add custom token filters that listed in the "Filter" section to any analyzer
I think I found the reason, I will sugest changes soon
@denispetrische can you checkout this branch and test out filters? Note that I renamed "Filter" to "TokenFilters"
so in your appsettings use "TokenFilters" instead of "Filter" to define the custom token-filters.
@denispetrische can you checkout this branch and test out filters? Note that I renamed "Filter" to "TokenFilters"
so in your appsettings use "TokenFilters" instead of "Filter" to define the custom token-filters.
I tested locally "fix logs message" and made my own pull request based on it https://github.com/OrchardCMS/OrchardCore/pull/17019. Check this. This works with every analyzer and custom token filters.
I don't think that's a good idea to rename "Filter" section to "TokenFilters" because it is how it called in elastic terminology,
@denispetrische
I tested locally "fix logs message"
Did you test this branch?
made my own pull request based
It better to suggest a fix here instead of create a new PR to avoid redundant work.
I don't think that's a good idea to rename "Filter" section to "TokenFilters" because it is how it called in elastic terminology,
Elastic docs referrers to the as Token filters https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
It's better to not name them "Filters" so they are not configured with filters property when defining custom analyzer.
Did you test this branch?
- Yeah, it still doesn't work when I create my own analyzer with custom token filters
Ok, I'll suggest them here
@denispetrische I have cleaned up the code quite bit. Please revisit your suggestions and see if you still have any suggestions. The code with it's current state is much cleaner.
I just tested creating a new token-filter and use it in my custom analyzer.
This is what I specified in the appsettings:
"Analyzers": {
"standard": {
"type": "standard"
},
"keyword": {
"type": "keyword"
},
"whitespace": {
"type": "whitespace"
},
"knowledgebase_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"english_stop"
],
"char_filter": [
"html_strip"
]
}
},
"TokenFilters": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
Then created an index called tf_test
using my custom knowledgebase_analyzer
analyzer. When I directly query the index settings using Kabana I get the expected structure
{
"ptest_blog2_tf_test": {
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "ptest_blog2_tf_test",
"creation_date": "1731606374163",
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"knowledgebase_analyzer": {
"filter": [
"lowercase",
"stop",
"english_stop"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "lDrk5TeTTaiD9lU27YUumg",
"version": {
"created": "8060199"
}
}
}
}
}
So the changes work with no problem.
"OrchardCore_Elasticsearch": { "ConnectionType": "SingleNodeConnectionPool", "Url": "http://localhost", "Ports": [ 9200 ], "Analyzers": { "standard": { "type": "standard" }, "keyword": { "type": "keyword" }, "whitespace": { "type": "whitespace" }, "knowledgebase_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "stop", "english_synonyms" ], "char_filter": [ "html_strip" ] } }, "Filter": { "english_stop": { "type": "stop", "stopwords": "_english_" }, "english_stemmer": { "type": "stemmer", "language": "english" }, "english_synonyms": { "type": "synonym_graph", "synonyms": [ "page,cage" ] } } }
Yes, now that works for me as well. And the code quality is much better now. Good job)
Fix #17012
/cc: @denispetrische