Closed iloveeclipse closed 10 months ago
Is the SWT binaries repo really necessary? What's the harm of building the natives on demand? Github provides build machines for the three main platforms, right? I was hoping to radically simplify SWT build process after the Github migration.
Also, if we combine SWT repos, we could attach platform specific source to the platform specific fragments and leave the main bundle empty. Physically, the source might stay in o.e.swt, but logically every binary project like o.e.swt.win32.win32_x86_64 would contain both the binaries and the source. No more .classpath switcheroo and it would be possible to work on all platforms from the same workspace simultaneously.
If you bisect SWT bugs, you won't (and probably can't) build on every commit. Building natives locally is a non trivial task (requires extra dependencies to be installed with root rights etc).
You can't bisect across two repos anyway, can you? TBH I never tried to bisect SWT bugs.
We can make the natives build almost a trivial task, but dependencies are of course unavoidable. JDK is a dependency too, why single out native code?
You can't bisect across two repos anyway, can you?
It is not the question of "can" but it is a "must have" because if you bisect SWT issues your native library version is essential to be able to start anything. And yes, I've bisected two repos and that is NOT FUN, because you have to look which commit corresponds to which native commit. Having both the code & binaries in LFS and in one repo would simplify that A LOT.
why single out native code
Because a "simple" java compile with maven doesn't need to install extra gtk/win native packages etc, and in order to do that you must be able to run tasks as a root / have access to the required packages at all. Once you have setup that, it's fine, but it is not trivial to setup. Most people don't even need that at all if they fix something in simple Java code.
Adding build artifacts to a Git repo also sounds wrong to me. We can publish the artifacts somewhere else to help with searching for errors. WDYT @akurtakov
There have been already a discussion with @sravanlakkimsetti about that with potentially just using something like downloads.eclipse.org/eclipse/swt-natives/4567a/win32/ and having maven/ant download them. If Git LFS could be used in an easier way - I'm all for that. The repository o.e.platform.swt.binaries is just an abomination no one dared to start the fight with. Thanks @iloveeclipse for starting this discussion. @nnemkin let's keep the issue dedicated at getting rid of the binaries repo and merging both as what you speak of would be nice but has this one as a strong prereq.
Corporates rebuild older versions of eclipse including SWT to release eclipse based products. Building on demand is not exactly possible as many of the dependencies would not be available.
One solution is to store the built libraries in a separate repo(that is the current solution).
Regarding git LFS, The individual libraries are not that big to start with less than 1 MB. I am not sure we are qualified for the LFS here.
Adding build artifacts to a Git repo also sounds wrong to me. We can publish the artifacts somewhere else to help with searching for errors. WDYT @akurtakov
What we are doing is not wrong. These are intermediate files used for the build purposes. I feel most of the thought process is going towards building master. for master we will have machines and other dependencies available for the build purposes. But that cannot be said when we want to build older release.
For example IBM builds 4.23, 4.19, 4,15 regularly. By removing these artifacts you are forcing the end users to maintain native build infrastructure for different versions. A lot of duplicate effort and it turn into cost to the end users.
We will definitely need to have the binaries stored. Where is going to be a question. Also we should make building aggregator simpler. Before triggering a build we should not fetch artifacts by some other means.
Same problem exists for team, equinox.framework, filesystem components as well.
So probably instead of check them into the git, simply deploy them as an additional attached artifact along with the maven deployment?
So probably instead of check them into the git, simply deploy them as an additional attached artifact along with the maven deployment?
This is what git LFS seamlessly does for you without any 3rd party tool chain.
The individual libraries are not that big to start with less than 1 MB. I am not sure we are qualified for the LFS here.
The number of files in history does it. Cloning SWT binaries will move over wire ~1 GB just because every binary file is in the history. The sources itself are minimal, and with git LFS only last native libraries version will be needed to copy from server, not all of them.
@sravanlakkimsetti Yeah, without a binaries repo, rebuilding historical releases will get more complicated, that's a real problem. But why are they rebuilding historical releases at all? Why not build them once and store in a maven/p2 repo forever?
We will definitely need to have the binaries stored. Where is going to be a question. Also we should make building aggregator simpler. Before triggering a build we should not fetch artifacts by some other means.
I believe we are all in agreement that for everyone but seasoned SWT developer having the native bits downloadable is beneficial. But this comes at a price - almost every release we have one broken build by bump of version of the host without doing so for the fragments. Not to mention other missed opportunities to simplify things. So the question is how do we organize a change in order to get more benefits than complications?
Same problem exists for team, equinox.framework, filesystem components as well.
One step at a time. Let's keep these discussions for other places. If we fix it in one place we will have better idea how to fix it in the other one. Btw, the team one is gone now after Linux moved to JNA 2 releases ago and Windows did so in master (https://bugs.eclipse.org/bugs/show_bug.cgi?id=578341). So we are getting there although the solution is totally different.
@iloveeclipse A big plus from migrating to github is that cloning is much faster. For me SWT binaries clones in just under 3 minutes...
Merging of SWT repos will break all previous SWT build setups and back-porting of patches which will become challenging if we merge repos:
Currently both of SWT-Sources & SWT-Binaries repo itself is combination of code from below 3 different platforms:
In general and specifically existing Eclipse/SWT wiki articles for setup/configuration are written with SWT(Source & Binaries) repo wise and it may also need to be fixed.
Sorry it's a NoGo for SWT(Sources & Binaries) repo merging.
Not only new comers but also all the existing corporate customers who just depend on Eclipse/SWT sources to build their products(on non-supported versions or non-supported platforms) are prone to break and may get out of picture for same set of reasons.
@sravanlakkimsetti Yeah, without a binaries repo, rebuilding historical releases will get more complicated, that's a real problem. But why are they rebuilding historical releases at all? Why not build them once and store in a maven/p2 repo forever?
One of the reasons could be a bug fix required in older releases. Atleast in IBM a product needs to be supported for 5 years. Any bugs reported needs to be fixed. so they end up rebuilding historical releases.
Making it difficult to build will make customers to backdown. This would a very bad idea.
I am all for optimizing. At the same time we should also make it simpler for customers to build their own products and make contributing easy. to me both are equally important one gets us funding and another gets us good fixes.
@merks : that depends on your provider / country.
@niraj-modi : I see your point.
Is moving SWT binaries to git LFS alone worth it then?
If we will go for LFS, we would need to: 1) Rewrite history for SWT binaries (which shouldn't be a problem)? 2) There should be no extra changes needed after that step (I think) Conversion info: https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-migrate.1.ronn
Pros: 1) Smaller clone size / faster checkout 2) anything else?
Cons: 1) One technology layer more 2) anything else?
@niraj-modi is SWT doing a lot of backporting? Looks to me that 4.23, 4.22, 4.21 and 4.20 maintenance branch still points to commit tagged for the release.
So let's try to summarize discussion so far to see whether we can come to an agreement.
Does it sound more applicable now?
When we speak about merging swt.binaries into swt repo I think no one had the intention to put actual dll/so files in swt repo
Nope, exact that is the proposal. Have everything in one repo will simplify bisecting / changes a lot, we would not need to synchronize anything across two different repos / different locations, because everything will be in same git repository.
I haven't looked into Git LFS
Git LFS stores the text url to the file in a server in the commit, not the binary itself in the repository (it goes to the extra server storage). This means, we can push 10 MB binaries but the commit record itself will be few bytes.
When we speak about merging swt.binaries into swt repo I think no one had the intention to put actual dll/so files in swt repo
Nope, exact that is the proposal. Have everything in one repo will simplify bisecting / changes a lot, we would not need to synchronize anything across two different repos / different locations, because everything will be in same git repository.
I haven't looked into Git LFS
Git LFS stores the text url to the file in a server in the commit, not the binary itself in the repository (it goes to the extra server storage). This means, we can push 10 MB binaries but the commit record itself will be few bytes.
Thanks for explaining. What does it mean for clone time? When is the binary for given commit downloaded ? At clone time? If it practically means no slowdown for clone while having the binary when needed sounds exactly what we need.
I think on clone you get only the last version of the binaries (with the checkout). So clone should only transfer ~10 MB for SWT (8MB for all latest binaries and few MB for the entire history) I guess.
git lfs clone
http://manpages.ubuntu.com/manpages/jammy/man1/git-lfs-clone.1.html@iloveeclipse Answering to you question posted on https://github.com/eclipse-platform/eclipse.platform/issues/7#issuecomment-1084601201: There is huge difference in sizes of SWT Source and SWT Binaries repo.
Your above comments suggests that the old SWT Binaries repo would be read-only. Read only SWT binaries repo might be a problem with SWT build input(as we commits tag & updated binaries during native build), so we will need old SWT binaries repo in writable state to work with. Thanks!
@niraj-modi : the whole point of git LFS is to avoid huge repo size.
So binaries are all on server and only fetched on checkout of a commit, in the "placeholder" there is just plain text link to the server url. I expect the SWT repo size after merge with SWT binaries be ~5 - 10 MB higher only (but I haven't tested, just looking how much space binaries taking on the actual binaries repo and how much the rest).
And for the read-only repo I meant the old one, not the new one (main SWT), which will be of course writable.
@niraj-modi : the whole point of git LFS is to avoid huge repo size.
So binaries are all on server and only fetched on checkout of a commit, in the "placeholder" there is just plain text link to the server url. I expect the SWT repo size after merge with SWT binaries be ~5 - 10 MB higher only (but I haven't tested, just looking how much space binaries taking on the actual binaries repo and how much the rest).
And for the read-only repo I meant the old one, not the new one (main SWT), which will be of course writable.
We will need writable repo not only for main master but also for maintenance branches as well(for SWT build input to work)
Note: At-least in IBM a product needs to be supported for 5 years and please check below log: https://git.eclipse.org/c/platform/eclipse.platform.swt.binaries.git/log/?h=R4_8_maintenance
Hope you understand my concerns w.r.t. to SWT binaries repo and reason/importance of it to be writable on master/old as well :)
@iloveeclipse There is no requirement for old repo to become readonly, right? We can e.g. clean master stating that it's no longer used and is there only for support or old releases. Do you think you can play a bit with some such repo of your own which we can play with to get better feeling of the possible implementation?
Regarding read only: sure. I assumed no one need write access to the main repo and all rebuilds happen on local clones.
Regarding playing with git lfs: I surely can try if that all worka at all as planned.
Regarding read only: sure. I assumed no one need write access to the main repo and all rebuilds happen on local clones.
@niraj-modi Do you have more principal concerns?
Regarding playing with git lfs: I surely can try if that all worka at all as planned. Thanks!
github provides 1 GB free bandwidth for git LFS repos, so it should be enough
Note that this is 1GB bandwidth. Here are my calculations, correct me if I'm wrong:
1) 7.28mb = sum of current *.so
, *.dll
, *.jnilib
files in a single checkout.
2) 1.53mb = same files compressed in .zip
. Gives a rough ballpark of how much LFS bandwidth is used per checkout. Here I'm assuming that GitHub charges bandwidth in terms of transferred (compressed) git objects and not in terms of uncompressed data.
3) This makes some 600 checkouts, for all people combined, per month.
4) Now consider build bots. How many checkouts per month do they make? My estimate: too many.
My conclusion: Free GitHub bandwidth is not sufficient for SWT binaries repo. We could probably use a separate LFS server, but this approach sounds quite weird in the grand picture of trying to move things to GitHub.
Here's my attempt to summarize (and discuss) mentioned benefits:
1) Faster clone - While I agree that cloning entire eclipse.platform.swt.binaries
is slow, I need to mention that shallow clone (basically cloning just the top commit) of the repo is super fast. Takes around 1 second for me. See also Bug 562937.
2) Easier bisecting - that's somewhat nice. On the other hand, I bisected a few times, and I can't say that manually checking out matching commit in binaries repo was a pain to me. For those who do bisecting more often, this can be further automated with a simple script (look up commit matching regex in SWT, look up same commit message in binaries, check it out there).
3) The general "let's merge stuff" thing, which I see as a motto, but not as a justification for any action.
I'm not sure if merging is worth the trouble.
One other thing to mention is that git-lfs is not included in default git. Some git distributions have it and some others may not have it. That would be a bit of additional headache for users to handle.
Here's an alternative plan that doesn't involve merging build artifacts into the main repo:
git clone --filter=blob:none https://github.com/eclipse-platform/eclipse.platform.swt.binaries.git
. This command omits all the heavy files when cloning and only clones commit structure. Git will automatically download the binaries whenever they are needed for checkout. It also downloads binaries for top commit. For me, it downloads ~8mb and takes ~7 sec (compared to minutes for regular clone).v4952r6
to also write binaries repo commit ID in some file in SWT repo.eclipse.platform.swt.binaries
on disk near SWT repo (and creates it if necessary, using the fast clone from (1)) and updates it to match, using commit ID from (2).With it, it will be a matter of one click to get/sync binaries repo.
This script can be added as necessary into git's post-checkout
hook, and build step in SWT. With these changes, binaries repo will stay in sync automatically, both in usual development and bisect workflows.
Try it yourself:
git clone --filter=blob:none https://github.com/eclipse-platform/eclipse.platform.swt.binaries.git
Takes some 7 seconds, with ~8.5mb download size
git checkout 447c3c107328488687560ef3fe6f4830a0228c35
That's a recent small commit. It takes less than 1 sec.
git checkout 6c44c1026bf37c6cf3a3dd528dfdd7c748fbe307
An older commit. Takes some 3 seconds; with ~3.5mb download size
4. Now consider build bots. How many checkouts per month do they make? My estimate: too many.
Builds could probably be faster if the checkout is cached somewhere ... maybe this is already the case.
Maybe a first iteration would be to simply link in swt the relevant comit from swt.binary (as we do in m2eclipse for the tests), that way we could change buildscripts and a like and if we are certain merge the repository afterwards (whatever technique we use then).
I still think deploy the binaries to some artifact server is much more usefull than lfs as these are completely different things, and it seems the real reason is to allow building older versions against a (released) artifact, no one would check in jars into git (with or without lfs) to do releases...
3. The general "let's merge stuff" thing, which I see as a motto, but not as a justification for any action.
Some justifications outlined probably in other places/mails/threads:
These are just immediate benefits not considering potential improvements like @nnemkin spoke about.
Did anybody check the support in JGit/EGit for the proposed solutions? I only see cgit commands so far. I really like not to leave the IDE to do the interaction with git repositories, same can be true for other (occasional) committers. I consider it a blocking issue if something is not supported by JGit and EGit.
Regarding read only: sure. I assumed no one need write access to the main repo and all rebuilds happen on local clones.
@niraj-modi Do you have more principal concerns? With writable binaries repo(master and old), we should be good now. Thanks!
So the limitation with git distributions not mandatory supporting LFS, unclear state about JGit/EGit support and 1GB bandwidth limitaton - this looks not viable solution (for now!) as much as promising it seems. So what do you think about renaming the issue to "Store swt dll/so files on downloads.eclipse.org and merge swt.binaries repo in the swt one"? My plan would be following:
@akurtakov sounds like a great plan to me.
Will moving binaries to downloads.e.o lead to decrease of build complexity? In terms of ease of access and operation, Github repo is far superior to downloads.e.o.
@akurtakov sounds like a good alternative plan to me.
W.r.t the limitations:
So the limitation with git distributions not mandatory supporting LFS, unclear state about JGit/EGit support and 1GB bandwidth limitaton - this looks not viable solution (for now!) as much as promising it seems. So what do you think about renaming the issue to "Store swt dll/so files on downloads.eclipse.org and merge swt.binaries repo in the swt one"? My plan would be following:
1. Modify native build scripts to put dll/so files to download.eclipse.org in ADDITION to storing in swt.binaries git repositories. 2. Come up with solution which downloads these dll/so files from download.eclipse.org in both maven and workspace build. 3. Stop storing dll/so files in swt.binaries git - at this stage fragments are still in swt.binaries repo 4. Move fragments to eclipse.platform.swt repository and stop using eclipse.platform.swt.binaries repo for builds from that point on. 5. eclipse.platform.swt.binaries repository stays as is - fully working for people backporting fixes
Since https://github.com/eclipse-platform/eclipse.platform.swt/issues/514 is about to land I think it is a good time to start working on this and can offer to do it.
Uploading the artifacts to download.eclipse.org
could be done similar like in m2e:
https://github.com/eclipse-m2e/m2e-core/blob/3d0b6ac50ea609941de09f89ccfe722c0b081e00/Jenkinsfile#L78-L81
My main question at which location the binaries should be stored and what's the name of the user that can assess the storage.
For the latter I assume something like genie.releng@projects-storage.eclipse.org
? For the former, is a suitable location already set up or do we have to ask infra to set it up?
Something like /home/data/httpd/download.eclipse.org/eclipse/swt.binaries
, which would probably result to https://download.eclipse.org/eclipse/swt.binaries
sounds reasonable for me.
Another question is the schema to use when storing the binaries. I think something like
swt.binaries/<eclipse-release>/<swt-natives-version>/<platform>.zip
Which then would look like
swt.binaries
- 4.27
- 4.28
- v4958r1
- v4958r2
- o.e.swt.linux.x86_64-binaries.zip
- o.e.swt.windows.x86_64-binaries.zip
- ...
The Zip files would then contain all binary artifacts for the corresponding platform and binaries version. Grouping would IMO make it simpler to keep an overview if manually going through the storage.
To download the native binaries the maven-download-plugin:wget could be used. If SWT developers have m2e installed the mojo can download and unzip the binaries as part of the workspace build without further ado. Since m2e is part of the Eclipse-Committers package I think that is no issue. It looks like that plugin supports everything required: downloading from an arbitrary URL, unzipping, skipping if already downloaded. And in case it is not sufficient I could still replace it with a manually crafted ant or groovy script.
The download URL can be adjusted automatically each time new binaries are deployed, just like it is currently done with the natives-binary version or the timestamp.
With that I think we can follow the suggested plan. What do you think?
Disclaimer : I still believe git LFS would be superior to what is currently planned, but I simply have no bandwidth to work on this. With git LFS all of the problems with downloading / syncing etc are solved out of the box.
Regarding current proposal.
Uploading the artifacts to
download.eclipse.org
For the former, is a suitable location already set up or do we have to ask infra to set it up?
I would first ask infra team if download.eclipse.org is to stay for the next decade (or at least few years) or it is planned to be obsoleted similar to other basic eclipse.org services (bugzilla, gerrit, wiki).
If SWT developers have m2e installed the mojo can download and unzip the binaries as part of the workspace build
Nope, this is not installed for me, I don't use "Committers" or any other package, I use "plain" SDK with some plugins on top. I wouldn't also install it, have unfortunately bad experience.
For the workspace build I would simply setup external launch config that runs required command line - that would work out of the box even with plain SDK.
Disclaimer : I still believe git LFS would be superior to what is currently planned, but I simply have no bandwidth to work on this. With git LFS all of the problems with downloading / syncing etc are solved out of the box.
I have no strong opinion and cannot fully assess if one way or the other is better because I have no experience with Git LFS and therefore don't know how well it works in practice with EGit. But if you say it works well, I'm fine with it too. :) My main motivation is to have the repos merged before simplifying the build procedure itself (see https://github.com/eclipse-platform/eclipse.platform.swt/issues/513) so that the history is not only preserved partly.
Uploading the artifacts to
download.eclipse.org
For the former, is a suitable location already set up or do we have to ask infra to set it up?I would first ask infra team if download.eclipse.org is to stay for the next decade (or at least few years) or it is planned to be obsoleted similar to other basic eclipse.org services (bugzilla, gerrit, wiki).
@fredg02 can you tell about the infra-teams plans in this regard?
If SWT developers have m2e installed the mojo can download and unzip the binaries as part of the workspace build
Nope, this is not installed for me, I don't use "Committers" or any other package, I use "plain" SDK with some plugins on top. I wouldn't also install it, have unfortunately bad experience.
May I asked you when you had the bad experience? We have worked a lot to improve M2E in the past. There is not yet perfect but I think it is getting better and better. I have it installed in all my Eclipse instances where I do my everyday Job (Plugin development) work and my development on Eclipse itself and did not faced issues in the recent past. So maybe you want to give it another try. :)
For the workspace build I would simply setup external launch config that runs required command line - that would work out of the box even with plain SDK.
If we can at least require a local maven installation, a corresponding minimal maven build could be launched as external tools launch. I would prefer to not have to set up download scripts for all three major platforms that have to be kept in sync.
I suggest to just merge build scripts into main repo and keep the rest as is.
Reading about ideas to have scripts that compose URLs to pull binaries from eclipse.org... Well, if it was already the case, and someone came in and suggested to convert that to a git repo, I would consider it an improvement! So the current plan sounds like a step back.
If you're already going to have scripts that pull binaries, it would be better if that script pulls from binaries git repo from github instead, based on git tag or whatever.
Why not use git submodules to automatically "mirror" the last state into the SWT repo?
We use submodules for the aggregator build and it seems to work for our build-infra without a problem.
Maybe one can even combine this with git LFS if it gives better performance.
Why not use git submodules to automatically "mirror" the last state into the SWT repo?
We use submodules for the aggregator build and it seems to work for our build-infra without a problem.
Maybe one can even combine this with git LFS if it gives better performance.
Please please, check what git LFS does and offers!
With it there is no need for us to mirror anything manually and use modules - what we need here is a seamless large binary storage outside main git repo - and that is what git LFS all about.
It just seems that git lfs is something (currently) hard to archive, so I see sub-modules as an intermediate solution we can use right now (no need to adjust the native parts at all) so we can archive "only checkout one repo", adjust the buildscripts for SWT, for aggregator and so on... then one can switch to LFS instead of submodules by adjusting "the other side".
Just an idea as this seems stuck for about a year now...
Please please, check what git LFS does and offers!
With it there is no need for us to mirror anything manually and use modules - what we need here is a seamless large binary storage outside main git repo - and that is what git LFS all about.
I have thought and read more about Git LFS and actually you are right, LFS is what we want and what we try to implement in a a poor-man's way when storing at downloads.eclipse.org.
According to the User guide E/JGit supports LFS: https://wiki.eclipse.org/EGit/User_Guide#GIT_LFS_Support Since it was asked before and @iloveeclipse since you are a committer of E/JGit you can probably tell how well LFS works there? But since you advertises its use I it is working well.
github provides 1 GB free bandwidth for git LFS repos, so it should be enough
...
My conclusion: Free GitHub bandwidth is not sufficient for SWT binaries repo. We could probably use a separate LFS server, but this approach sounds quite weird in the grand picture of trying to move things to GitHub.
Yes probably you are right that the free bandwith is not sufficient since GH counts even the clones of forks in the original repo. :/ Nevertheless if also plain renaming doesn't lead to a new download then at least the downloads by developers should not be that many because one only has to download a binary if it has really changed. But of course the builds will probably dominate this.
Would it be possible to use an alternative storage, at the EF-infra, just for the Large-File storage? If not, the EF could also just buy more bandwidth from GH. 5$ per 50GB extra seem not to be too expensive to me: https://docs.github.com/en/billing/managing-billing-for-git-large-file-storage/about-billing-for-git-large-file-storage#purchasing-additional-storage-and-bandwidth
And maybe they have a heart for large FOSS projects. 😃
Would it be possible to use an alternative storage, at the EF-infra, just for the Large-File storage?
Looks like it is possible: https://docs.github.com/en/enterprise-server@3.4/admin/user-management/managing-repositories-in-your-enterprise/configuring-git-large-file-storage-for-your-enterprise#configuring-git-large-file-storage-to-use-a-third-party-server
Looks like it is possible:
Form the documentation:
When Git LFS is enabled on your GitHub Enterprise Server instance
So this would require an enterprise subscription: https://github.com/pricing
not sure what EF currently uses, but I thing we can simply open a ticket, I did a git clone --depth=1 https://github.com/eclipse-platform/eclipse.platform.swt.binaries.git
that results in ~8MB of data so for the 5$ option we then get 6250 fresh clones ...
So this would require an enterprise subscription: https://github.com/pricing
I'm in doubt that this is limited to enterprice subscription since this seems to be a general configuration option of Git-LFS and I assume that the pushing is handled at the client side (CI or dev computer) and not at the GH server.
I tried out @iloveeclipse suggestion to use LFS and I find it promising and it seems to be quite easy to use and straight forward to set up. Just created https://github.com/eclipse-platform/eclipse.platform.swt.binaries/pull/33 with my results. Everybody interested in this, please have a look.
There was a good question asked on https://github.com/eclipse-platform/eclipse.platform/issues/7 - why don't we use git LFS for SWT binaries? As of today, SWT is about 250 MB and it seems that the github provides 1 GB free bandwidth for git LFS repos, so it should be enough.
The benefit is surely that we don't need to synchronize two repos and have platform code & binaries in one, plus much faster clone.