-
hi medcl:
Is the filter "pinyin" supported in Custom normalizer filter?
Offical site says, "Custom normalizers take a list of character filters and a list of token filters." But when I tr…
-
{"settings":{"max_result_window":1000000000,"analysis":{"analyzer":{"pinyin_analyzer":{"tokenizer":"my_pinyin"}},"tokenizer":{"my_pinyin":{"type":"pinyin","keep_separate_first_letter":true,"keep_full_…
-
#### 索引设置
```json
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"default": {
…
cosoc updated
4 years ago
-
版本 6.x,Pull记录显示已经修复了该问题,但测试问题依旧存在啊
只要配置了 "ignore_pinyin_offset": false 写数据的时候就会报错,
Pull见:https://github.com/medcl/elasticsearch-analysis-pinyin/pull/206
-
(related to #37)
The pinyin of the different entries could possibly be merged using word separation based on an analysis of the characters using CoreNLP. In most cases, the pinyin would be merged, …
-
例如:
```
PUT /medcl/
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
…
-
最近做拼音分词,发现hanlp+pinyin分词是批量向ES中索引数据出错,刚开始以为是hanlp与pinyin不能集成,后面又试了一下IK+pinyin。结果使用ik_max_word也会同样出现这个问题,然后使用ik_smart却不会出这个问题,请问是什么原因,请大神帮忙看看。
mapping配置:
{
"order":0,
"index_patterns":[
…
-
我使用首字母搜索的时候发现翘舌音(z/c/s+h)会在一起导致搜索异常。
比如库中有“中华人民共和国”:
curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhonghua"}}}'
curl -XGET 'localhost:9200/news/_search' -d '{"quer…
-
# 我的配置:
## index
```
PUT /secom/
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pi…
-
报错提示:
startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=1,endOffset=2,lastStartOffset=4 for field 'nickname.pinyin'
创建索引的命令:
``…