Increasingly slow feedback loop for developers, increasingly large WAR files

pdurbin commented 5 years ago

During this week tech hours I expanded on what I said in Slack the other day:

"For tech hours, I'm growing increasingly concerned that the feedback loop for developers is getting slower and slower. You change code. It deploys to Glassfish. Then you get to see your change. This is getting slower and slower and more and more painful. Dataverse should be fun to hack on. Not painful."

Specifically, I started a spreadsheet called "Development Feedback Loop Time in Seconds" to start attempting to quantify my claim above. As of commit 1d37e9911 it took 36 seconds get feedback on the most trivial change I could think of, making a small tweak to the code behind /api/info/version add a little more output. Other developers are welcome to add to this spreadsheet to talk about their own experiences (or leave a comment here): https://docs.google.com/spreadsheets/d/12Co_WHgouTPC6tQkL9XyTMJyr_-ZOHPBLW4Or57bqWA/edit?usp=sharing

I explained that when I've hacked on PHP apps in the past, feedback is fairly immediate. Feedback from Java will always be slower, I think, but part of my point is that a tiny Java web app like https://github.com/pdurbin/javaee-docker (that does nothing but print "It works!") will give developers a much faster feedback loop than Dataverse. Why is this?

One theory is that the feedback loop is a function of the size of the WAR file you are attempting to deploy. This makes sense to me because, anecdotally speaking, Dataverse was more lean and mean at our 4.0 release than it is now at 4.11. I would be happy to repeat the /api/info/version test above (or equivalent) to see how long the feedback loop is. My memory tells me that I used to get feedback faster and that Dataverse is getting slower and slower to deploy over time. @landreev seemed to agree with deployment time increasing with later releases of Dataverse based on his recent experience deploying every release of Dataverse from 4.0 to present in order to put the new "create" scripts in pull request #5317.

Last night I created a spreadsheet called "Size of Dataverse WAR file over time": https://docs.google.com/spreadsheets/d/1uL5CVGhMh6Vcr_UUgwbrcHterz7BM1qMPPPrk6ao4ZY/edit?usp=sharing

Here's the data on how much the size of the Dataverse WAR file has increased from 4.0 until 4.11 (from the spreadsheet above):

As you can see, Dataverse 4.0 was 45 MB and Dataverse 4.11 is 187 MB. In between, there were a few jumps that are probably worth mentioning:

4.3 added support for DataCite DOIs: https://github.com/IQSS/dataverse/compare/v4.2.4...v4.3
4.8 added support for S3: https://github.com/IQSS/dataverse/compare/v4.7.1...v4.8
4.10 added support for indexing the content of files: https://github.com/IQSS/dataverse/compare/v4.9.4...v4.10

Does size matter? @AdamBien says it does. He wrote "WAR sizes are directly related to deployment speed and so productivity" in his post at http://www.adam-bien.com/roller/abien/entry/ears_wars_and_size_matters and on his podcast he promotes the idea of "thin WARs". The argument is that Java EE (Jakarta EE now) has so many APIs that your WAR file should be mostly business logic. As of e707a22 cloc indicates Dataverse is 137,301 lines of Java, 221,090 lines overall (full report below). How big would the "thin WAR" be with just business logic code and zero dependencies on anything but Java EE? I don't know. My gut is telling me that dependencies (AWS SDK, etc.) make up the bulk of the 187 MB in the Dataverse 4.11 WAR file.

An approach I'm less familiar with is "hollow WARs" but I believe the idea is that you put as many dependencies as you can into your application server. Your code still relies on those dependencies but they are no longer in the WAR file itself.

I've focused on size above but are there other approaches? According to the JRebel website, "JRebel fast tracks Java application development by skipping the time-consuming build and redeploy steps in the development process. JRebel helps developers be more productive by viewing code changes in real time and maintaining state." That sounds like the problem I'm describing, but is JRebel a band-aid rather that a fix for the root cause?

Should we split Dataverse into microservices? I mentioned this at tech hours but the general feeling is that microservices will give us a new set of problems.

So, to summarize, the problem are:

Increasingly slow feedback loops for developers, causing losses in productivity (obligatory XKCD cartoon below).
Increasingly large images to upload to Docker Hub for community efforts like https://github.com/IQSS/dataverse-docker and https://github.com/IQSS/dataverse-kubernetes

compiling

Questions:

How long was the feedback loop for Dataverse 4.0?
What is a reasonable feedback loop in seconds to expect for a Java EE application with as much code as Dataverse?
Does size matter?
How much effort will it be to remove dependencies?
Are tools like JRebel worth investigating?
Should we consider breaking up the Dataverse monolith into microservices?

I'm opening this issue because I was asked to after our conversation during tech hours this week but what I wrote above is only to kick off the conversation. What do you think about all this? What are your experiences? What are your ideas? Please leave comments.

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            896          32254          30517         137301
XHTML                            74            574            531          13530
XSD                               3            596              8          13291
JavaScript                       30           3075           1916          12376
SQL                              67           2379           2497          11896
CSS                              19            215            167           8769
JSON                            144             27              0           7157
XML                              43           1014           3506           4370
Bourne Shell                    149            407            481           2332
RobotFramework                   24              0             41           1859
Perl                              7            569            251           1684
Python                           37            472            566           1524
Maven                            13             83            100           1345
HTML                             24            105             42            963
Markdown                         27            286              0            853
R                                 9            170            182            633
Bourne Again Shell               58             88             85            582
make                              2             43              6            220
DOS Batch                         1             29              1            212
Dockerfile                        5             31             39             82
Ruby                              1              5              8             42
YAML                              2              4              1             34
INI                               1             12             31             23
XSLT                              1              0              2             12
--------------------------------------------------------------------------------
SUM:                           1637          42438          40978         221090

By the way, the "size of WAR" chart above was made by downloading all the "pages" from the GitHub API with commands like curl https://api.github.com/repos/IQSS/dataverse/releases?page=1 and then manually combining the files into a single JSON file I could use with this (ugly) one liner cat all.json | jq '.[].assets[] | {name, size}' | grep war -A1 | sed ':a;N;$!ba;s/,\n/ /g' | grep -v '\-\-' | perl -lane 'print "@F[1,3]"' | tr -d '"' | tr ' ' '\t' | tac | sed 's/dataverse-//g' | sed 's/\.war//g' > data.tsv

pameyer commented 5 years ago

Summarizing a very non-systematic experiment I mentioned to @pdurbin:

on develop-e707a22cf
deploy war from normal mvn package (system where the installer had been run previously, with dataverse uninstalled from glassfish); ~1m2s to deploy. war file ~187M
unjar war file, put everything from WEB-INF/lib into a different archive, delete the jar files there; re-jar to a new war file, new war is ~4M
stop glassfish, dump the dependency archive into $GLASSFISH_HOME/glassfish/libs (same location as JDBC jar), restart glassfish
deploy the ~4M war; ~35s to deploy.

poikilotherm commented 5 years ago

Hi @pdurbin,

thank you for opening this issue and your hard work getting these numbers. During my testing and developing for IQSS/dataverse-kubernetes, I noticed the indeed long deploy times, too.

Adding my 2 cents to this (just ignore me if you disagree):

I second the opinion of "don't transform in micro services". But I think it would be a good idea to split up the monolith in logical parts and maybe look into the direction of SOA.
IMHO it's not only about size of the WAR. I tried in #5274 to strip down the WAR file, but it still needed quite some time to deploy. I remember we chatted about "why is now Spring in here" some time ago. Adding more and more features while loading heavyweight stuff into the WAR might affect performance. See also 1. :wink: Please also note #5360.
Looking at the startup logs, I see a big bunch of stuff getting loaded. It should make things faster when 1) starting to upgrade to more current versions of stuff, 2) trying to reduce used parts of Java EE (maybe switch to MicroProfile or WebProfile) and 3) trying to profile the startup process, catching time ghouls stealing it. And there might be improvement by trying to get rid of those exception throwing stuff - those seem time consuming...
Hollow, Thin, ... WARs are tricky. You should be using a recent base for this, which is definitly not Glassfish 4. 4a. @pameyer be carefull with stripping out everything. While working on #5292 I did just the same and learned the hard way that is a bad idea. See 43ccd30. Putting some things outside and some things not needs more thinking and tricky solutions than I had time for yet.
@pdurbin being a PHP dev for a few years and a Java dev for another few: don't worry too much. IMHO you really can't compare those two very different ecosystems.
I have no experience with JRebel, but I second you on "don't use a band aid to cure a broken leg".

pameyer commented 5 years ago

@poikilotherm That was definitely a crude experiment at stripping things out (the /api/info/version endpoint worked, didn't check anything else). I'd been looking to see if it was an order of magnitude improvement; and that doesn't look like the case.

djbrooke commented 5 years ago

@scolapasta will get some questions together for a spike related to this.

matthew-a-dunlap commented 5 years ago

Some actions items out of tech hours that we can act upon:

Hello-world spike-test to see speed of deploying the minimum possible glassfish application (to understand what is the fastest possible).
Automate and time our build/deploy process across multiple version (we could start by looking at the ones with major size spikes, or just run all the versions), include line count of server.log. Put these in the new columns at "Metrics for Dataverse by release" (formerly "Size of Dataverse WAR file over time") at https://docs.google.com/spreadsheets/d/1uL5CVGhMh6Vcr_UUgwbrcHterz7BM1qMPPPrk6ao4ZY/edit?usp=sharing

As a separate issue:

Better understand Oliver and Pete's approaches to detaching the libraries from the application for faster deployments

pdurbin commented 5 years ago

I guess this issue has become #5736. Closing.

pdurbin commented 4 years ago

This issue still drives me absolutely crazy. Re-opening.

This is the command I just ran:

time (mvn package && asadmin-payara deploy --force target/dataverse-4.20.war && curl http://localhost:8080/api/info/version)

It indicates that it takes a minute and 48 seconds to compile and deploy.

Here's the output:

real    1m47.751s
user    2m10.534s
sys 0m9.309s

It's a productivity killer.

pdurbin commented 4 years ago

I'm on fc37facc5 and just timed deployment from Netbeans. It's still excruciatingly slow.

After changing some back end code (DatasetPage.java) I hit F6 which means "run project" and it took 2 minutes and 43 seconds to compile and deploy.

It's a productivity killer.

djbrooke commented 3 years ago

Jrebel may help with this - has been working well locally
Investigate moving dependencies to the side instead of in the war file (this may or may not work - let's try)
What's the acceptable time for a feedback loop here (let's try for 25% better than benchmark)

landreev commented 3 years ago

Like everybody else, I support speeding it up, and I'm happy that it's been scheduled. But just want to point out again (@poikilotherm already mentioned it too, above) that we should not expect the size of the war file to directly translate into the deployment speed. When it comes to the dependencies specifically. I honestly don't think there is a significant difference between deploying with all the extra jars packaged in the same war, or outside of it. In either case the actual classes from these libraries are only loaded and instantiated as needed. So the savings would only come from the time spent unzipping these jars and saving them in the applications/ hierarchy every time... but that's a couple of extra seconds most likely?

(Also, since we are primarily talking about developers - as a developer you are likely using direct deployment, from the target/ directory in your project - bypassing the war file stage... there's still some extra copying of these dependent jars, but it's gotta be a fairly negligible overhead)

So yes, a practical solution will likely have to be some form of hotswapping. It's great to hear that there may be some good open source tools available (let's look into HotswapAgent, yes)

pdurbin commented 3 years ago

Here's another data point. I'm hacking away on something API-related. I change the code. I hit F6 in Netbeans to redeploy. It takes 64 seconds before the app is ready. To me, this is too slow and I feel like we can do better. I'm on 5.3 with a brand new MacBook.

kuhlaid commented 1 year ago

The Dataverse needs to be rewritten in a language other than compiled Java. Typescript sounds good. The current development environment is unsustainable.

pdurbin commented 10 months ago

Good progress in recent days!

cmbz commented 1 month ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

pdurbin commented 1 month ago

As predicted, this issue was automatically closed by the sweep we're making through old issues: https://groups.google.com/g/dataverse-community/c/lDVJq-1CHLY/m/Vl824d8fAQAJ . That's fine.

In practice, I'm happier these days, mostly thanks to efforts by @beepsoft that led to write ups in https://guides.dataverse.org/en/6.3/container/dev-usage.html#ide-trigger-code-deploy that allow me to often redeploy code quite quickly. It's not perfect. Sometimes, especially for larger code changes, I have to shut down the whole works and get it running again. But when quick redeploys work they save me a ton of time.

Please note that our war file still a big fat pig. 🐷 And it's still slow to deploy. But maybe we can open a fresh issue about that some day! 😅

IQSS / dataverse

Increasingly slow feedback loop for developers, increasingly large WAR files #5593

10088

10102

9959