Open rmannibucau opened 7 years ago
Hi @rmannibucau: Please excuse my late answer. Grab a coffee , this is going to be long.
We need to admit that this is an awesome feature. It gives user the ability to create custom analyzers just like elastic. We definitely want this feature included in the project without any doubts.
There are some things we want you to change before merging this pr:
You have used the master branch as the starting point to develop this feature but that branch is not upgraded with last changes for a long time. Can you rebase it to branch-3.0.12 please?
We don't really like how you have solved the json parsing part. We have always strived to mantain clean and well validated jsons. I would generate a new package named 'com.stratio.cassandra.lucene.schema.analysis.tokenizer' with an abstract class called TokenizerBuilder and one inherited class for every type of tokenizer. I would repeat this process for character filters and token filters with the corespongding unit tests for all of them(correct json parsing, invalid json parsing producing an exception with the correct error message).
I attach two pictures that would explain this better:
Also, parameters validation is not very helpful, error messages does not help user to know exactly where the error is. Also, if any user wants to use this project, it would be great if that user does not need to dig into lucene documentation to be able to know which are a certain filter parameter types. So, a new section id doc/documentation.rst must be added.
What about including tests for an invalid (tokenizer, char_filter..) type?
Also, there is a submodule named builder used by clients to easily generate CQL queries. This functionality must be included in that. It also has unit tests
I have started to develop my own version (what images shows) of this feature in branch feature/build_custom_analyzer. I will upload it
soon.
I can continue with this but it is your decision. Do you want to change your code to meet our requirements with me as the reviewer or do you want me to develop this with you as reviewer?
Looking forward your answer.
Hmm, have to admit I dont know how you can achieve json validation and opening of the instantiated instance cause one of the goal was to ensure we have shortcuts for well know instances/types - where this fully makes sense to use inheritance - but also customs or evolutive ones - where you cant validate it anyway.
What would be the plan for such support?
Also not sure I get the versioning (that's surely why i used master): why 3.0 and not 3.10?
If you are on that I guess you'll be faster than me so happy to close that otherwise I can give it a try in a few weeks probably (can't at the moment).
Idea is to enable to configure through the json payload passed when creating the index/analyzers some more advanced and open configuration. Here is a sample:
And here is the equivalent java code: