apache / arrow-cookbook

Apache Arrow Cookbook
https://arrow.apache.org/
Apache License 2.0
95 stars 46 forks source link

[Java] Refactor Java cookbooks to switch from JShell to Maven #350

Closed amoeba closed 3 months ago

amoeba commented 3 months ago

In https://github.com/apache/arrow-cookbook/issues/347 we found the way we have been running cookbooks for Java (JShell) doesn't work well with JPMS which was introduced in Arrow 16. This refactors javadoctest.py to run examples directly with Maven using exec:java instead of with JShell. This PR also bumps the Java source/target version to 11 to fix some compiler errors and fixes a few compilation errors in cookbook code.

I ran into one snag will require a follow-up commit to this PR: The way the examples in substrait.rst are written doesn't work with my approach. My approach splits each code snippet into its import statements and non-import statements, puts the imports outside the main class definition and puts the non-imports inside the class's main method. This works fine for every example except substrait.rst which needs some of its code to be defined in the main class, e.g.,

We probably generally want to support examples that need this so I think we may need to rewrite all the Java cookbooks to have an explicit main class. @lidavidm suspected this might be the case in https://github.com/apache/arrow-cookbook/issues/347#issuecomment-2049725497 but I do wonder if there is still a way to avoid this. Any ideas welcome.

Fixes #347 Related #348

lidavidm commented 3 months ago

Honestly, why try to support writing Java code inline in reST in the first place? It's a bad experience for recipe authors. If the examples were all external Java files that we could compile there would perhaps be some more boilerplate but it would be nothing unusual for Java and it would avoid the cookbooks being such special snowflakes.

amoeba commented 3 months ago

The current CI failures are the expected ones (one is to bump Avro and the other is the explicit main class thing mentioned above).

amoeba commented 3 months ago

That would probably be better in many ways, though I don't think it solves the problem of how to mix the prose and code together. Might be worth some research into what's already been built that does something like what we're aiming to do here (literate programming) for Java.

lidavidm commented 3 months ago

https://github.com/apache/arrow-adbc/blob/main/docs/source/ext/adbc_cookbook.py is fully general, though I haven't actually tried it with Java (but it works with C++ and Python).

There's also https://github.com/apache/arrow-adbc/blob/main/docs/source/ext/javadoc_inventory.py which allows crosslinking from Sphinx into Javadoc without hardcoding URLs.

amoeba commented 3 months ago

Thanks @lidavidm, those look like solid options. For the purposes of this PR and getting the cookbooks working for 16.X, I think I'll just wrap the current examples in class defs and plan to refactor things in the near future.

amoeba commented 3 months ago

I updated the Substrait cookbook code and how javadoctest.py works so all the cookbooks pass now locally when I bump Arrow to 16.0.0. I left the version bump out of this PR since we bump versions separately so CI will still fail on the Avro cookbook.

I'll file an issue to rework the cookbooks in a sec here. @raulcd do you want to take a look before I merge this?

amoeba commented 3 months ago

Thanks @raulcd. Something along those lines does seem reasonable for #351. I'll merge this and follow-up with bumping versions which should make the cookbook CI pass.