IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

Update and reorganize the XOAI dependencies under local_lib #8372

Closed poikilotherm closed 1 year ago

poikilotherm commented 2 years ago

Currently, Dataverse codebase uses a custom patched XOAI 4.1.0, provided in /local_lib.

It would be a good idea to

  1. Find out what patches have been applied (needs anatomization of patched JAR sources with original sources)
  2. Fork https://github.com/DSpace/xoai (which is the moved repo from https://github.com/lyncode) to @gdcc
  3. Find out if our patches are still needed
  4. As development in https://github.com/DSpace/xoai has stalled (see e.g. https://github.com/DSpace/xoai/issues/72#issuecomment-557292929), make it our package
  5. Update LOTS of dependencies and especially get rid of log4j-1.2!
  6. Setup Github Workflow, Code Coverage etc.
  7. Make releases to Maven Central
  8. Use & test in Dataverse upstream code

It might be worth a try to 1) not rename the packages but still publish under our @gdcc Maven group id and 2) create fork and a pull request for upstream plus setting up wei/pull to auto-create pull requests when DSpace updates their branch.

landreev commented 2 years ago

I am removing the log4j dependency that's being brought in by XOAI as part of https://github.com/IQSS/dataverse-security/issues/48; (I only found this a couple of days ago myself; about to make a PR into the main Dataverse project.)

But yes, in the process of tracing this dependency, I was reminded of this XOAI debacle, that we still have these locally-built libraries in the project, etc. (I can comment more on this;) Once the immediate log4j issue is addressed, we can use this issue to address cleaning up the XOAI. I was going to open one myself, but thanks for beating me to it.

landreev commented 2 years ago

(Once we merge the pom.xml PR that removes the log4j jar from the war file, I'm going to update the title of this issue to reflect the fact that it is mainly about refactoring these XOAI dependencies, and not specifically about log4j. I will add some information on these XOAI libraries in the meantime).

landreev commented 2 years ago

As development in https://github.com/DSpace/xoai has stalled (see e.g. https://github.com/DSpace/xoai/issues/72#issuecomment-557292929), make it our package

Yes, the situation is back to where it was 6 years ago, when lyncode still nominally owned the package; but it wasn't being actively maintained and it was not possible to submit and merge a PR. We really liked the package, but it was impossible to use it without a few minor fixes. So that's what forced us to build those jars locally in the first place. We appear to have missed the window when DSpace was somewhat actively maintaining the package. And then these libraries got further stuck in our local_lib limbo. So we definitely need to assume a more active role there in order to clean up this dependency. But we will need to decide what exactly "mak[ing] it our package" should entail.

Bad news:

  1. I'm seeing no trace of the local fork of the old lyncode repo from which those jars were built, anywhere.

Better news:

  1. We still have all the sources (and they are included under local_lib, as source jars, I believe).
  2. I will recreate that fork somewhere on github; either under IQSS/, or as my own, as the first step.
  3. The changes were really minimal.
  4. It's a fairly simple package to start with.

It might be worth a try to ... not rename the packages but still publish under our https://github.com/gdcc Maven group id and

That's an interesting idea. I never even considered that - publishing somebody else's packages? But that appears to be exactly what they did with reload4j - forked from log4j-1.2.17, released with the package pathnames unchanged, but under their own group id ... (assuming that's what you meant/had in mind above - ?). (I guess it's a slightly different situation with reload4j - since log4j-1 was officially retired/EOL-ed; and the new incarnation was made by the original author of log4j).

OK, that part we can figure out as we go. But we can definitely proceed with the other parts. Create a fork of the latest DSpace xoai-4 under gdcc, patch/fix what's necessary to get it to work with Dataverse, make pull requests, see how that goes etc.

poikilotherm commented 2 years ago

@landreev please take a look at https://github.com/gdcc/sword2-server for an example of what I did to release our own version to Maven Central.

This was a fork of what DANS did (which also was a fork of the original source) to push necessary changes from us, plus some more stuff to modernize and adapt for dependency updates. And it still has the old Java package layout, but being released under our Maven group ID. 🤠

poikilotherm commented 2 years ago

@landreev I started moving stuff around in https://github.com/gdcc/xoai

landreev commented 2 years ago

Added "feature: Harvesting" label; to make sure the issue is added to the pile of Harvesting issues we are prioritizing now as part of the GREI effort.

mreekie commented 2 years ago

sprint

mreekie commented 2 years ago

The library has had some bugs addressed. This lead to DV creating it's own jar.

poikilotherm commented 2 years ago

I'd appreciate help on this in https://github.com/gdcc/xoai (currently crammed with lots of other work)

I started moving lots of things into the lib, as some of the deps aren't maintained either. Happy to answer questions and give permissions.

pdurbin commented 2 years ago

I checked in with @poikilotherm about this in Matrix today: https://matrix.to/#/!AmypvmJtUjBesRrnLM:matrix.org/$s548c8B-SZ-rdF5GN9LyZZ94YoZxDMkSW1QXLzvCc3A

One thing I didn't realize is that @poikilotherm is hoping that this issue could be worked in in parallel with the Payara 6/Jakarta EE 9 upgrade in #8305. (We're pretty sure we'd like to go ahead and upgrade to EE9 per discussion in Slack: https://iqss.slack.com/archives/C010LA04BCG/p1651155098181049 ). When I asked, "Wait. Is this all dependent on the upgrade to Payara 6 and Jakarta EE 9?" he said, "That's half way correct... The Jakarta EE 9 move was the motivation to create a branch with these breaking changes, so we avoid any inconsistencies... So this is mostly about getting rid of old Java EE and refactor ye old deps". The branch he's talking about is https://github.com/gdcc/xoai/tree/branch-5.0 .

While there are unit tests to work on, it sounds like @poikilotherm's plan is to build and test the refactored xoai jar with Payara 6. We aren't working on Payara 6 in the sprint we just started, but I said I'd at least poke around.

A little poking seems helpful. mvn package didn't "just work" and @poikilotherm is helping me with it at https://github.com/gdcc/xoai/issues/11

I'll keep poking and will coordinate more with @poikilotherm next week. There's still refactoring to do and other tasks such as pushing a beta (or snapshot or whatever) to Maven Central.

poikilotherm commented 2 years ago

Let me add some notes here:

landreev commented 2 years ago

Just reading the updates from the last 2 days now. Yes, I'm happy to help with this work in /gdcc/xoai. That's why I pushed for prioritizing the issue/adding it to the schedule.

I thought we were going to start with getting Dataverse to build with the new, gdcc-branded jars in its (Dataverse's) current state; just to make sure everything worked as it should. If there's a compelling reason to go straight to Payara 6/EE 9, we could do that too. But then we can't switch to gdcc/xoai until we upgrade.

poikilotherm commented 2 years ago

Well there is still room for going with Jakarta EE 8 if this is what you want.

We could create a v5.0 for DV 5 and then create a v6.0 for DV 6 and Jakarta EE 9 (10?)

The code refactors already done are still targeting Jakarta EE 8 (so far only JAXB 2.3 in use), see https://github.com/gdcc/xoai/blob/06dbf60994fee51246e4b6b0c436d00d16b3b652/pom.xml#L46.

Now would be a good time to decide on this. 😎 I'm fine with sticking to EE 8 for now, too.

landreev commented 2 years ago

I would personally vote for starting with EE 8. I am operating under the assumption that it would then be possible to make a PR switching Dataverse to gdcc/xoai fairly quickly/easily. Is that correct?

Thank you for leading this effort/all the work that's already been done, @poikilotherm.

landreev commented 2 years ago

I'm going to make a PR into branch-5.0 of gdcc/xoai to sync it with the changes from my original lyncode fork that Dataverse relies on. I'll be using the issue https://github.com/gdcc/xoai/issues/2 there for any further discussion of this task.

mreekie commented 2 years ago

sprint:

mreekie commented 2 years ago
mreekie commented 2 years ago

Sprint: