DataCleaner is a complex tool. And as the lead developer on it for years, I'm sorry to say - I don't think it's maintainable in it's current state. I'd like to propose making DataCleaner maintainable by retaining what it is at it's core for 99% of its users, and ditching the complexity that is not really used anymore anyway. This is specifically related to making it easy to build and develop on DC. But also to make it easy to run in modern JVMs.
Remove the dynamic classloading / extensions / drivers and such. This has a huge technical complexity cost and makes the tool incompatible with newer JDKs.
Remove Spark engine - nobody uses DC for that sorta stuff by now
Remove the Scala components - too much build complexity for the value that it brings. This would mean getting rid of the "Visualizations" components though.
I'm going to make a branch for this. If nothing else for my own benefit of being able to build and run DC. But I think it should be considered the next major version of DC.
DataCleaner is a complex tool. And as the lead developer on it for years, I'm sorry to say - I don't think it's maintainable in it's current state. I'd like to propose making DataCleaner maintainable by retaining what it is at it's core for 99% of its users, and ditching the complexity that is not really used anymore anyway. This is specifically related to making it easy to build and develop on DC. But also to make it easy to run in modern JVMs.
I'm going to make a branch for this. If nothing else for my own benefit of being able to build and run DC. But I think it should be considered the next major version of DC.