eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

Investigate Options for Jenkins artifact Storage #2637

Open AdamBrousseau opened 6 years ago

AdamBrousseau commented 6 years ago

Related #1017 Our space is limited on our Jenkins master. Eclipse has asked us to target 100G and we are currently using almost double of that [1]. Artifactory is an artifact repository manager and could allow us to stash (temporarily) our SDKs somewhere other than master. They offer a free version (Apply). This would allow us to keep more builds (logs) on master and more artifacts on a separate server. There is also an Artifactory plugin for Jenkins that would allow us to easily switch to archivingLast time I looked I believe there was an option to setup Artifactory yourself for "free"?

We have an existing machine (proxy_worker) attached to Jenkins that we already use for running infra type builds as well as a proxy into other machines we use that are behind a vpn. We could try setting up Artifactory on it. This machine is provisioned via an OpenStack service and has the option to create volumes (up to 1T total) and attach them to running instance(s). This would give us the storage.

NOTE:

Joe already has a Pull Request open to add Artifactory support to the builds.

2272

AdoptOpenJDK/openjdk-tests#512

AdamBrousseau commented 6 years ago

I went ahead and created a 1T volume in the OpenStack Dashboard and attached it to proxy_worker (/dev/vdb). I needed to format etc the drive and mount it on the machine. ref: https://askubuntu.com/questions/154180/how-to-mount-a-new-drive-on-startup

sudo fdisk /dev/sdb

Press o and press Enter (creates a new table) Press n and press Enter (creates a new partition) Press p and press Enter (makes a primary partition) Then press 1 and press Enter (creates it as the 1st partition) First sector (2048-2097151999, default 2048): Enter Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151999, default 2097151999): Enter Finally, press w (this will write any changes to disk)

Create a filesystem

sudo mkfs.ext4 /dev/vdb1

Add it to /etc/fstab

#device        mountpoint             fstype    options  dump   fsck
/dev/vdb1    /home/jenkins/artifact_storage/    ext4    defaults    0    1

Create the dir

mkdir /home/jenkins/artifact_storage
sudo chown jenkins:jenkins /home/jenkins/artifact_storage/
sudo mount -a
AdamBrousseau commented 6 years ago

Sent an email to OSU to confirm this type of usage of their systems is permitted.

AdamBrousseau commented 6 years ago

Response from OSU: Lance Albertson

I don't have a problem with this but I wouldn't mind having some input from one
of our IBM advocates before moving forward on it. Thanks for reaching out
initially.
I don't think we'll have any problems with bandwidth or storage so it's all good
from my side at least.

Jeffrey J. Scheel (IBM advocate)

This feels like "normal" community activity which means as long as Lance feels 
it can fit, I'm good. 
AdamBrousseau commented 5 years ago

Another option is to use the Eclipse Nexus offering https://wiki.eclipse.org/Services/Nexus#Deploying_artifacts_to_repo.eclipse.org I am unsure how to use this as it seems to only be supported for Maven or Gradle builds.

Edit: Also https://wiki.eclipse.org/Jenkins#How_do_I_deploy_artifacts_to_download.eclipse.org.3F

AdamBrousseau commented 5 years ago

Opened Bug 541852 to ask about download.eclipse.org

AdamBrousseau commented 5 years ago

Summarizing options

Artifactory

Github

Eclipse

Dropbox

HTTP Server

@samuel-rubin is going to investigate the HTTP Server option

samuel-rubin commented 5 years ago

I'm investigating the free Artifactory oss on a ppc64le machine. I was able to install it on that system. I am now looking into adding https to the server instead of regular http.

samuel-rubin commented 5 years ago

I noticed that the properties and builds feature of an artifact is not allowed on the OSS version.

Also, "Promotion, demotion and cleanup of build artifacts, Managing build artifacts for reproducible builds" is a paid feature

AdamBrousseau commented 5 years ago

Running some tests to collect data on transfer times of SDKs to/from Artifacotry to all sites. For comparison purposes I have added 2 columns for current times to/from Jenkins master

Test case:

I will schedule this build to run every 3 hours so I can collect data throughout the day and will spot check the times. Based on these inital numbers I think we should continue on the Artifactory path. I have WIP PRs to enable it but will wait to discuss.

SPEC SITE Upload to Artifactory Download from Artifacotry Upload to Jenkins master Download from Jenkins master
plinux OSU
  • 3s
  • 3s
  • 5s
  • 4s
2m 4-35m
plinux UNB
  • 1m51s
  • 44s
  • 1m23s
  • 2m29s
  • 39s
  • 59s
  • 58s
  • 1m23s
  • 1m15s
  • 1m26s
2m 4-35m
xlinux UNB
  • 3s
  • 1m29s
  • 4s
  • 38s
  • 1m13s
  • 1m38s
  • 1m20s
  • 1m12s
  • 1m26s
  • 1m35s
2m 3-30m
osx UNB
  • 2s
  • 2m51s
  • 1m21s
  • 2m50s
  • 1m56s
  • 1m49s
  • 1m11s
  • 1m14s
  • 1m30s
  • 1m34s
2m 3-30m
zlinux Marist
  • 4m3s
  • 2m32s
  • 2m25s
  • 3m27s
  • 2m51s
  • 18s
  • 29s
  • 18s
  • 30s
  • 23s
10m 2-20m
win Softlayer
  • 24s
  • 3m42s
  • 3s
  • 3s
  • 5m44s
  • 9s
  • 12s
  • 13s
  • 12s
  • 10s
2m 2-15m
aix IBM PDP
  • 6s
  • 3m41s
  • 4s
  • 5m45s
  • 4s
  • 12m32s
  • 9m6s
  • 8m59s
  • 10m10s
  • 11m26s
3m 15-30m
AdamBrousseau commented 5 years ago

If we are to go ahead with Artifactory, these are the remaining actions I can think of. Not sure if any of them need to be a blocker other than the approvals and PRs. Let's see how many we can get completed before we are ready to turn it on @samuel-rubin. FYI @jdekonin @smlambert

pshipton commented 5 years ago

Transfer times all seem reasonable except for AIX. Strange that download is considerably slower than upload. We should look into it and see if it can be improved. Is the disk really slow?

DanHeidinga commented 5 years ago

@AdamBrousseau How many builds are uploading / downloading from artifactory at once vs Jenkins?

The current transfer times look fine to be the same or less than the Jenkins times which is a plus provided they stay at that level when there's as many concurrent accesses as Jenkins has

AdamBrousseau commented 5 years ago

I think the AIX issue is pipe. Joe tested pulling onto an Adopt AIX machine which is hosted at OSU (same site as Artifactory) and the times were around 5s. I don't think we would be able to do anything about that other than get machines somewhere else.

AdamBrousseau commented 5 years ago

My test is 6 at once (1 per SPEC). Each run would be staggered by how long it takes to first get a machine, then get the SDK from Jenkins to test with. I can add more but I didn't want to hog machine time from real builds. We could theoretically have as many upload/download requests at a single point in time as we have number of machines. Keep in mind that these transfer times only contribute to ~5% of an overall build pipeline, so the likelihood of bombarding Artifactory with requests is likely not a huge concern.

I will adjust it to 1 upload & 3 download per spec to see if we can affect the download times.

AdamBrousseau commented 5 years ago

First run with 3 pulls in parallel per spec zlinux: 32s, 36s, 31s xlinux: 2m3s, 2m5s, 2m15s plinux: 1m32s, 1m40s, 1m31s osx: 1m11s, 1m42s, 1m33s win: 9s, 11s, 10s (aix machines busy) Another point, x/p/osx all at UNB so that's 9 pulls "at the same time" from Artifactory to UNB.

AdamBrousseau commented 5 years ago

For the Artifactory server, these are our options for access

Unfortunately the free version does not allow configuring for LDAP or GH Teams and I don't want to be in the business of hand adding & maintaining accounts.

@DanHeidinga @pshipton

pshipton commented 5 years ago

I'm fine with leave it open, which is what we have now with Jenkins

DanHeidinga commented 5 years ago

+1 to leave it open

AdamBrousseau commented 5 years ago

Nightly builds seems to be slower than original testing numbers listed above. Pulling a nightly and an OMR (daytime) build for comparison. Adding to the original table above

SPEC/SITE Original: Upload to Artifactory Original: Download from Artifacotry Upload to Jenkins master Download from Jenkins master Updated: Upload to Artifactory Updated: Download from Artifacotry
plinux/OSU
  • 3s
  • 3s
  • 5s
  • 4s
2m 4-35m
  • Night: 3s
  • Day: 26s
  • Night: 3m
  • Day: 4s
plinux/UNB
  • 1m51s
  • 44s
  • 1m23s
  • 2m29s
  • 39s
  • 59s
  • 58s
  • 1m23s
  • 1m15s
  • 1m26s
2m 4-35m
  • Night: 3m
  • Day: 2m
  • Night: 1m
  • Day: 52s
xlinux/UNB
  • 3s
  • 1m29s
  • 4s
  • 38s
  • 1m13s
  • 1m38s
  • 1m20s
  • 1m12s
  • 1m26s
  • 1m35s
2m 3-30m
  • Night: 2m
  • Day: 2m
  • Night: 1m
  • Day: 1.5m
osx/UNB
  • 2s
  • 2m51s
  • 1m21s
  • 2m50s
  • 1m56s
  • 1m49s
  • 1m11s
  • 1m14s
  • 1m30s
  • 1m34s
2m 3-30m
  • Night: 3m
  • Day: 2m
  • Night: 1m
  • Day: 1.5m
zlinux/Marist
  • 4m3s
  • 2m32s
  • 2m25s
  • 3m27s
  • 2m51s
  • 18s
  • 29s
  • 18s
  • 30s
  • 23s
10m 2-20m
  • Night: 9m
  • Day: 8m
  • Night: 23s
  • Day: 17s
win/Softlayer
  • 24s
  • 3m42s
  • 3s
  • 3s
  • 5m44s
  • 9s
  • 12s
  • 13s
  • 12s
  • 10s
2m 2-15m
  • Night: 4m
  • Day: 4m
  • Night: 40s
  • Day: 11s
aix/PDP
  • 6s
  • 3m41s
  • 4s
  • 5m45s
  • 4s
  • 12m32s
  • 9m6s
  • 8m59s
  • 10m10s
  • 11m26s
3m 15-30m
  • Night: 16m
  • Day: 17m
  • Night: 16m
  • Day: 16m

zlinux upload seems to be 2-3x slower aix is much worse for upload and 50% worse for download. @pshipton is this a blocker for #2836 or are we ok to proceed and just keep an eye on these numbers?

pshipton commented 5 years ago

What I don't like are

AdamBrousseau commented 5 years ago

how long it takes to download to my desktop

Agree it's slow (at the office) 7mins when I tried.

Looking at one of the failed curls https://ci.eclipse.org/openj9/job/Test_openjdk12_j9_sanity.functional_x86-64_linux_Nightly/18/ 27-Apr-2019 23:09:07

23:13:33  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz
23:17:16  Failed to retrieve https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz, exiting. This is what we received of the file and MD5 sum:
23:17:16  ls: cannot access 'https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz': No such file or directory
23:17:16  md5sum: 'https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz': No such file or directory

Catalina.out has nothing at that timestamp

2019-04-27 20:00:00,087 [art-exec-8081] [INFO ] (o.a.s.b.s.GarbageCollectorInfo:80) - Storage garbage collector report:
Number of binaries:      4,133
Total execution time:    72 millis
Candidates for deletion: 0
Checksums deleted:       0
Binaries deleted:        0
Total size freed:        0 bytes
Current total size:      345.21 GB
2019-04-28 00:00:00,007 [art-exec-8111] [INFO ] (o.a.r.s.t.p.TrashcanPruner:96) - Trashcan pruning total execution time: '3 millis'

Nothing in request.log

20190427222712|23|REQUEST|51.38.12.13|anonymous|GET|/webapp/|HTTP/1.0|200|0
20190428024333|35|REQUEST|148.100.33.173|jenkins|GET|/api/system/version|HTTP/1.0|200|0

Nothing in artifactory.log

2019-04-27 20:00:00,087 [art-exec-8081] [INFO ] (o.a.s.b.s.GarbageCollectorInfo:80) - Storage garbage collector report:
Number of binaries:      4,133
Total execution time:    72 millis
Candidates for deletion: 0
Checksums deleted:       0
Binaries deleted:        0
Total size freed:        0 bytes
Current total size:      345.21 GB
2019-04-28 00:00:00,007 [art-exec-8111] [INFO ] (o.a.r.s.t.p.TrashcanPruner:96) - Trashcan pruning total execution time: '3 millis'

Its like the request never reached the server. I see that there were 3 builds that failed to curl at that time. 23:13:31, 23:13:43, 23:13:33. All from UNB. Perhaps the UNB network had a blip? I doubt 3 requests overloaded it. That was when the curl was --retry 5 --retry-delay 30

Looking at passed builds from unb that night.

23:13:40  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz
23:19:30  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/native-test-libs.tar.gz
23:19:43  unzip
23:13:31  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz
23:18:46  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/native-test-libs.tar.gz
23:19:03  unzip
23:13:28  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/OpenJ9-JDK12-x86-64_linux-20190428-021815.tar.gz
23:15:20  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK12_x86-64_linux_Nightly/16/native-test-libs.tar.gz
23:15:58  unzip
23:12:26  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/OpenJ9-JDK8-x86-64_mac-20190428-022114.tar.gz
23:16:56  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/native-test-libs.tar.gz
23:16:56  unzip
23:12:39  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/OpenJ9-JDK8-x86-64_mac-20190428-022114.tar.gz
23:16:53  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/native-test-libs.tar.gz
23:16:53  unzip
23:12:25  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/OpenJ9-JDK8-x86-64_mac-20190428-022114.tar.gz
23:14:32  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_x86-64_mac_Nightly/16/native-test-libs.tar.gz
23:14:32  unzip
23:12:51  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/OpenJ9-JDK11-x86-64_linux_xl-20190428-022049.tar.gz
23:18:56  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/native-test-libs.tar.gz
23:18:56  unzip
23:12:39  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/OpenJ9-JDK11-x86-64_linux_xl-20190428-022049.tar.gz
23:18:12  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/native-test-libs.tar.gz
23:18:18  unzip
23:18:50  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/OpenJ9-JDK11-x86-64_linux_xl-20190428-022049.tar.gz
23:20:43  curl -OLJks --retry 5 --retry-delay 30 --user ****:**** https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK11_x86-64_linux_xl_Nightly/16/native-test-libs.tar.gz
23:20:55  unzip

So there were at least 12 requests to download all from UNB around that timeframe. I suppose, if all compiles across mac/plinux/xlinux/xlinux_xl and 8/11/12 finish around the same time, we could have 48 requests all from UNB at once. This number wouldn't be this high as we don't have that many machines there and we have 3 plinux at osu which would further mitigate requests from UNB.

AdamBrousseau commented 5 years ago

Looking at the Artifactory and Tomcat settings

server.xml

...
<Service name="Catalina">
        <Connector port="8081" sendReasonPhrase="true" relaxedPathChars='[]' relaxedQueryChars='[]' maxThreads="200"/>

        <!-- Must be at least the value of artifactory.access.client.max.connections -->
        <Connector port="8040" sendReasonPhrase="true" maxThreads="50"/>
...

https://jfrog.com/blog/monitoring-and-optimizing-artifactory-performance/

If the number of requests exceeds the maxThreads value, requests will be queued (up to the maximum specified by the acceptCount attribute)

Might be worthwhile setting up java melody to monitory the Artifactory server https://jfrog.com/knowledge-base/how-to-configure-javamelody-in-artifactory-for-monitoring/