EvidentSolutions / elasticsearch-analysis-raudikko

Finnish language analysis for Elasticsearch using Raudikko
GNU General Public License v3.0
10 stars 8 forks source link

Request - Any plans for Elasticsearch 8.4 support? #5

Closed tuoruu closed 1 year ago

tuoruu commented 1 year ago

We are looking into Elasticsearch version 8.4.3 which requires Java 17. Do you have plans to support ES 8+ versions in the future?

It seems that at least FinnishTokenizer / CharTokenizer would require some work for the component - either by changing the Tokenizer or making it work with newer Java version.

komu commented 1 year ago

There are no immediate plans of supporting ES 8+. At the moment we have a couple of customer projects using Raudikko and neither of them are going to migrate to ES 8+ soon. I also doubt that anyone is going to do this just for fun.

But of course, PR's will be gladly accepted. So if you end up doing this, we'll be happy to incorporate the changes.

If it's possible to do with same code or at least isolate the changes cleanly using some build logic, it would be optimal. But if that feels too hacky, having separate branches for 7.x and 8.x is also a possibility.

ssaarinen commented 1 year ago

Made initial changes required for ES8 here: https://github.com/ssaarinen/elasticsearch-analysis-raudikko/tree/es-8

Basically requires java 17 and gradle 7.3+ after which project now compiles and test passes. Testing if this actually works with ES 8 is left as an exercise ;)

ssaarinen commented 1 year ago

Ok, after a bit of docker/wsl shenanigans I was able to get it running (docker compose setup added to my branch) and tested it with the etc/test-analyzer.http which returned this:

HTTP/1.1 200 OK
X-elastic-product: Elasticsearch
content-type: application/json

{
  "tokens": [
    {
      "token": "testata",
      "start_offset": 0,
      "end_offset": 9,
      "type": "word",
      "position": 0
    },
    {
      "token": "raudikko",
      "start_offset": 10,
      "end_offset": 18,
      "type": "word",
      "position": 1
    },
    {
      "token": "analyysi",
      "start_offset": 19,
      "end_offset": 28,
      "type": "word",
      "position": 2
    },
    {
      "token": "tämä",
      "start_offset": 29,
      "end_offset": 34,
      "type": "word",
      "position": 3
    },
    {
      "token": "tapa",
      "start_offset": 35,
      "end_offset": 42,
      "type": "word",
      "position": 4
    },
    {
      "token": "yksinkertainen",
      "start_offset": 43,
      "end_offset": 59,
      "type": "word",
      "position": 5
    }
  ]
}
tuoruu commented 1 year ago

Great! We have had also some progress on our side. ES is progressing so rapidly so now our target seems to be 8.5.2 but haven't tested that with the cloud version yet, only locally.