Closed tuoruu closed 1 year ago
There are no immediate plans of supporting ES 8+. At the moment we have a couple of customer projects using Raudikko and neither of them are going to migrate to ES 8+ soon. I also doubt that anyone is going to do this just for fun.
But of course, PR's will be gladly accepted. So if you end up doing this, we'll be happy to incorporate the changes.
If it's possible to do with same code or at least isolate the changes cleanly using some build logic, it would be optimal. But if that feels too hacky, having separate branches for 7.x and 8.x is also a possibility.
Made initial changes required for ES8 here: https://github.com/ssaarinen/elasticsearch-analysis-raudikko/tree/es-8
Basically requires java 17 and gradle 7.3+ after which project now compiles and test passes. Testing if this actually works with ES 8 is left as an exercise ;)
Ok, after a bit of docker/wsl shenanigans I was able to get it running (docker compose setup added to my branch) and tested it with the etc/test-analyzer.http
which returned this:
HTTP/1.1 200 OK
X-elastic-product: Elasticsearch
content-type: application/json
{
"tokens": [
{
"token": "testata",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "raudikko",
"start_offset": 10,
"end_offset": 18,
"type": "word",
"position": 1
},
{
"token": "analyysi",
"start_offset": 19,
"end_offset": 28,
"type": "word",
"position": 2
},
{
"token": "tämä",
"start_offset": 29,
"end_offset": 34,
"type": "word",
"position": 3
},
{
"token": "tapa",
"start_offset": 35,
"end_offset": 42,
"type": "word",
"position": 4
},
{
"token": "yksinkertainen",
"start_offset": 43,
"end_offset": 59,
"type": "word",
"position": 5
}
]
}
Great! We have had also some progress on our side. ES is progressing so rapidly so now our target seems to be 8.5.2 but haven't tested that with the cloud version yet, only locally.
We are looking into Elasticsearch version 8.4.3 which requires Java 17. Do you have plans to support ES 8+ versions in the future?
It seems that at least FinnishTokenizer / CharTokenizer would require some work for the component - either by changing the Tokenizer or making it work with newer Java version.