Open kaspersorensen opened 2 years ago
I have prepared a branch to illustrate what would be removed...
https://github.com/datacleaner/DataCleaner/compare/remove-scala?expand=1
The branch looks more or less good to me, although it's been so long I even looked at Java that I'm not sure I'd take my word for it. 😊
Buuuut it's mostly code removal, so as long as it builds and all the runners still work, I guess we're good. I can't test right now, but I'll try to see if I can get some time for it later today or tomorrow.
However, I don't see deletions of the .scala
files themselves? But okay, it's just an example, so I guess it doesn't matter for now.
Regarding pulling it into its own extension, I guess it would take quite a bit of refactoring to allow runners, especially ones that needs to change the system in such a major way, in extensions? If I remember correctly (which is not a given), we/you tried something like that originally, but ended up in this way because such a fundamental change to running just got too hard without quite a bit of coupling. But maybe the Scala parts themselves could be kept in the extension, while the base of the runner was kept here? Admittedly that DOES sound like a bit of a strange design, but if we think the Spark runner still has value to users, it would be a shame to lose.
I just realized that the branch is in no way ready to go :) I mean, there are a bunch of components that are just no longer included, and I guess we just didn't have integration tests for those, but they would disappear from the product if we merged that branch. But I think they're not too hard to reproduce, so that's definitely next step if we want to complete this issue.
Regarding the Spark runner. I agree it's probably not going to be easy to make it a proper extension. I was more thinking that we could make it a separate distribution. A bit like datacleaner-docker
or whatever. A distribution that would include it's own Main class and would only be built to work with Spark.
I mean, the other thing is that Spark has moved on massively since this was built. I think everything will break and have to be partially rewritten if we just upgrade Spark to the latest version. But I think it's time that it does get upgraded somehow.
Ah yeah, I remember most of the Spark components being reasonably simple.
How about I finally get back to contributing (and re-jiggle my Java experience a bit) by taking at least some of them on? But it might be a few days before I get started.
Yeah give it a shot! I'm ready to cheer you on! For my part I'm gonna then look into some of the more simple-n-stupid Scala-to-Java conversions in the non-Spark areas of the code. Like the HTML rendering module and more.
Hi all,
I am picking up DC development for a bit, after a long hiatus. And coming back to this project is making me realize how long and complex a build we have. I would like to make DC build (and thereby the overall developer experience) much nicer by simplifying it. Right now I am spending a lot of time just getting it to compile on my fresh installation. And the main culprit is something that I've noticed before: Scala, and to some extent also the Spark module. So I would suggest to simplify the developer experience by: