datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
599 stars 181 forks source link

Make DataCleaner simple - remove complexity of Spark, Scala and dynamic extensions #1968

Closed kaspersorensen closed 1 week ago

kaspersorensen commented 2 weeks ago

DataCleaner is a complex tool. And as the lead developer on it for years, I'm sorry to say - I don't think it's maintainable in it's current state. I'd like to propose making DataCleaner maintainable by retaining what it is at it's core for 99% of its users, and ditching the complexity that is not really used anymore anyway. This is specifically related to making it easy to build and develop on DC. But also to make it easy to run in modern JVMs.

I'm going to make a branch for this. If nothing else for my own benefit of being able to build and run DC. But I think it should be considered the next major version of DC.