Closed jollycar closed 5 years ago
Hi,
Thanks for reporting this issue. After investigation, this is due to the fact that ElasticSearch uses the Standard Tokenizer (Standard Token Filter - Lower Case Token Filter - Stop Token Filter) for text field by default. So basically when you index a document { "title": "my title 1"}
, the index creates three references to this document (my
, title
, 1
) to allow full text search natively.
But in terms of sorting, it sorts all references by alphabetical order and remove duplicate after:
That's how ElasticSearch works by default and this can be tweaked a little bit but not really through IPFS-Store.
As a short term solution, you can configure manually the index field mapping in ElasticSearch like this:
POST http://127.0.0.1:9200/documents/documents/_mapping
{
"documents": {
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
Run the query (with sort=title.raw
):
GET query/search?index=documents&page=0&size=1000&sort=title.raw&dir=ASC"
In the future, I will try make IPFS-Store more configurable for this.
Thanks again for raising this issue!
Greg
As per the recent refactoring, it is now possible to pre-configure an ElasticSearch index mapping in the API in order to pre-create the index on startup with the necessary index fields mapping.
In your case, you could create a mapping file _indexmapping.json like this:
{
"mappings": {
"_doc": {
"properties": {
"__hash": {
"type": "keyword"
},
"__content_type": {
"type": "keyword"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
The index field title is indexed in two manners:
(_hash and contenttype are two required fields by Mahuta)
Once the file setting on the server, we need to pass the following arguments to the API
-Dspring-boot.run.arguments=--mahuta.elasticsearch.host=localhost, --mahuta.elasticsearch.port=9300, --mahuta.elasticsearch.clusterName=docker-cluster, --mahuta.elasticsearch.indexConfigs={"name":"document", "map":"index_mapping"}
indexConfigs takes an array of index name / config, so the API creates accordingly these index with the config on startup.
Situation