Overall design questions

salvalcantara commented 2 years ago

Hi @ariskk,

I am learning Scala 3 and considering the transition from Scala 2 in the context of Flink. Since this is yet a nascent (but exciting!) project I would like to ask a few questions about the overall design at this early stage:

Is it really necessary to wrap the Java classes? Why not using the Java API classes directly? For example, I took a quick look at [1] and overall it looks good (without any scala wrappers whatsoever if I'm not mistaken). I guess the idea is to enhance ergonomics/make things more Scala idiomatic (similar to [2]) but at first sight for this case in particular I cannot see a huge win. I think the advantages should be clearly stressed/demonstrated in the readme, so that justification is out of the way.
Have you considered other design options (other than composition)? Namely [3]:
- Inheritance/Subtyping
- Extension Methods/Type Classes

REFERENCES:

ariskk commented 2 years ago

Hi @salvalcantara,

The motivations behind the very existence of flink4s are twofold:

Drop in replacement for flink-streaming-scala, so that existing projects can migrate to Scala 3.x without much refactoring.
Provide more idiomatic Scala methods eg combine, unorderdMapAsync and others.

[3] is the default option if you want to extend a third party library. biMapWithState and combine have been the most used methods in my past projects and were implemented using this pattern. It is true that this library could just provide an implicit class wrapper (or Scala 3.x extension methods) and be a lot more lightweight.

This brings us to your original question (1). It is true to you can use the Java classes directly. This has been possible for a while, but yet again most projects use the Scala wrapper. A couple of reasons off the top of my head:

Encapsulating conversions between Scala types and the Java types the Java classes expect (ef fromCollection, executeAndCollect).
Type Erasure can kill Flink's ability to infer the output serializers and you won't know until runtime. The trickiest part here is that if you use any type parameter in your code (eg def processMyDataStream[T](myStream: DataStream[T]) , and you do not implement ResultTypeQueryable you will get a runtime exception. The Scala wrapper enforces that with an implicit typeInfo: TypeInformation[T] and thus you don't risk runtime errors.

So to summarize:

It is possible to just rely on extension methods. It would work in most cases.
For convenience, it is more practical in practice to encapsulate the complexity of Scala/Java interoperability in a well tested interface and thus avoid nasty runtime surprises.

That said, very good food for thought! I will think about from your perspective and comment back!

Thank you!

salvalcantara commented 2 years ago

Hi @ariskk, thanks for your replies. I've got some more questions regarding the scope of this library:

As you say, I guess the idea here is to cover everything needed to make usage of Scala 3 pleasant in the upcoming Scala-free versions of Flink, namely, cover most of the functionality in flink-streaming-scala (DataStream API) and maybe also that in flink-scala (in particular it contains type info stuff).
I'm asking because I've come accross flink-type-info, a project that should cover everything as per type info, so maybe this project could focus on the functionality so far provided in flink-streaming-scala. Which is the overlap between these two projects?
Finally, have you considered supporting Scala 2.13 as well as in flink-type-info? It might make sense given the fact that the latest Scala version supported by Flink will be 2.12, so that might make the transition smoother (?).

ariskk commented 2 years ago

Hi @salvalcantara ! I haven't seen flink-type-info before. It seems to be focusing on deriving TypeInformation instances using shapeless (instead of the macro Flink provides). Shapeless doesn't support Scala 3.x currently. At a glance, I can't see any overlap between the two projects. flink-type-info seems to be focusing solely on generating serializers. In general, due to my experience here I am very very reluctant to use any "magic" in the TypeInformation generation process. For the last few years I have been using a TypeInfo trait in my projects.

Re Scala 2.13, there are two options:

By enabling the -Ytasty-reader flag a Scala 2.13.4+ project can depend on a Scala 3 artefact. For more info: https://docs.scala-lang.org/scala3/guides/migration/compatibility-classpath.html
I may as well cross compile and publish the project in Scala 2.13 too. I may look into that.

salvalcantara commented 2 years ago

BTW @ariskk flink-type-info seems to have disappeared (???), but there is the magnolia-based equivalent: https://github.com/findify/flink-adt, which I guess would play the same role.

ariskk commented 2 years ago

Flink4s requires an in implicit TypeInformaton instance to work, but it has no opinion as to how this is generated. Flink4s and flink-adt should work together fine (in 2.13.4+ at least).

salvalcantara commented 2 years ago

Hey @ariskk, have you seen this: https://github.com/findify/flink-scala-api? It clearly overlaps with this project and seems way more advanced. I came across this here: https://lists.apache.org/thread/3ghw0cvkftv349j9x4j3h1twzwt75fwj.

ariskk commented 2 years ago

Hey @salvalcantara I was the first to star flink-scala-api. It is a pure port of the existing code, so likely easier to migrate. My aim for flink4s was to create native Scala APIs that enhance what's already there. That said, I no longer have a use case for it as I have left the Flink project I was working on. If there is no interest from contributors, I don't see myself maintaining this for long. I will likely archive the project and point people to flink-scala-api at some point.

salvalcantara commented 2 years ago

@ariskk I think it would make more sense to unite forces in flink-scala-api (and flink-adt). Wouldn't it be possible to add whatever extensions on top of flink-scala-api, which already covers all the basics, allowing flink4s to narrow down its focus?

salvalcantara commented 2 years ago

@ariskk Another thing I noticed is that you use case classes for everything (e.g., DataStream) while the official flink-scala (as well as flink-scala-api) uses regular classes. Using case classes might be problematic if you want to use inheritance (e.g., if you want KeyedStream to extend DataStream you cannot and would need to handle it in a different way).

ariskk / flink4s

Overall design questions #17