Current Rakudo (possibly MoarVM as well) development process hinders releasing

Altai-man commented 4 years ago

Here I will describe a couple of situations which has happened during last months to show a particular flaw of the current development process this issue brings up. I have zero intention to bash our volunteer developers, so take it as an offense towards development process / culture in a whole. I am happy people are willing to spend their time and efforts toward making Raku implementation better, but I am sure it can be done in a safer, less painful and thus more enjoyable process for everyone.

Feb 13, this commit to MoarVM bumps version of the dyncall library shipped with MoarVM. The new version contained a critical bug which lead to build failures when linking against musl, glibc alternative notably used by Alpine distro popular for CI containers. It makes into a release and leads to going off to developers, waiting for a patch and re-taking a release, because there were no checks for something we expect to support to detect it.
Mar 15, this PR to MoarVM is merged. This JIT contained a critical bug observable on Windows. Later, revision with this bug is brought in Rakudo and master branch CI checks start to fail. They fail until a fix is provided at Apr 26, that makes a 41 days straight the master branch was broken which (along with the dispatch situation which was solved by a revert) blocked the release completely, because the bump itself was not checked for green lights and then failures were ignored until it was very hard to say where the issue can be.
Apr 3, this commit introduces a regression in relocatable builds, which goes on unnoticed, released and May 8 a solution was given, resulting in a release re-take because we had no checks for something we expect to support to detect it.
Jun 5, this PR resolves a long standing Windows issue. All checks are failed due to, possibly, our setup not supporting changes which involve PRs in MoarVM/nqp/rakudo being needed simultaneously. After merge, Windows builds of master start to fail and again they fail for 14 days until someone provides a two-line patch after some asking for it for a couple of days before release.
Jun 6, this commit uses declarations not compatible with older gcc. This commit offends our Circle CI check after a bump since around here and nobody is bothered by the check failing. When I tried to go for it, the website redirects me to sign-in page. Seeing this unusual behavior (as usually you don't need to do anything to view CI public logs, maybe it wants one to sign-in and enable them?), assuming we migrated to Azure and the check will be dropped soon anyway, we go into release. Now, master fails for 15 days straight and counting. A single person's, release manager, misunderstanding and common ignoring failures on master has lead us to yet another point release.

As it was stated on irc, there are Expectations from our releases. As was said another time, Raku is not "is it a vaporwave?" anymore and it went a long path from a project where a bunch of folks were committing code to something used in production and it is just "expected" by people we support different platforms (even when there are no checks for them).

If we want to ensure our releases met expectations, current development process which is, as shown above, prone to creating problematic situations, must be addressed.

Altai-man commented 4 years ago

Possible solutions

There are not so many solutions I can suggest, but I have one.

In the described problems, there are two sources of evil: 1)no check for some case we "suppose" we support; 2)the check shows red and was ignored.

To address these, we need to fix 1 and fix 2, changing current development culture, including:

Development migrates into PR-first mode instead of committing to master. Master branch is protected from a PR merge if the CI checks are not green. Want to change something -> PR -> checks green ?? Review and merge !! Re-take.
A check failure on master branch is considered as an extreme situation and we don't move forward until it is resolved.
Do not rely on "We assume we support some rarer than usual platforms and try not to break them, but there are no real checks around" anymore. Establish a complete list of platforms and tools we Officialy Met Expectations for and add a clear CI check for every missing point of this list.

I know it may bring in some disapproval, saying that such restrictions are not fun for developers anymore, but I am sure that it is certainly not fun for developers to debug issues introduced 40 days ago and it is not fun for people to have troubles with packaging new releases and doing other wiring and it is not fun for us all to spend more time on fixing the consequences rather than spending less time on keeping our master branch healthy preventing the consequences in the first place.

lizmat commented 4 years ago

On 22 Jun 2020, at 10:25, Altai-man notifications@github.com wrote:

Possible solutions

There are not so many solutions I can suggest, but I have one.

Change current development culture, including:

• Development migrates into PR-first mode instead of committing to master. Master branch is protected from a PR merge if the CI checks are not green. Want to change something -> PR -> checks green ?? Review and merge !! Re-take. • A check failure on master branch is considered as an extreme situation and we don't move forward until it is resolved. • Do not rely on "We assume we support some rarer than usual platforms and try not to break them, but there are no real checks around" anymore. Establish a complete list of platforms and tools we Officialy Met Expectations for and add a clear CI check for every missing point of this list. I know it may bring in some disapproval, saying that such restrictions is not fun for developers anymore, but I am sure that it is certainly not fun for developers to debug issues introduced 40 days ago and it is not fun for people to have troubles with packaging new releases and doing other wiring and it is not fun for us all to spend more time on fixing the consequences rather than spending less time on keeping our master branch healthy preventing the consequences in the first place.

I could live with that, provided that the CI is actually reliable. So far, I have seen waay more false positives from CI than I have seen false negatives. It's the false positives (when CI says there's something wrong, and it's the CI that is wrong) that are impeding development.

AlexDaniel commented 4 years ago

See https://github.com/rakudo/rakudo/issues/3700#issuecomment-629869968.

Once finished, it should help with these issues: Mar 15, Jun 5, and maybe Jun 6 because CI status should become more helpful.

AlexDaniel commented 4 years ago

Do not rely on "We assume we support some rarer than usual platforms and try not to break them, but there are no real checks around" anymore. Establish a complete list of platforms and tools we Officialy Met Expectations for and add a clear CI check for every missing point of this list.

I agree with this, but technically it also means running Blin on all of these platforms. Did you know we support mipsel? We should definitely strive towards more platforms being tested, but it's probably not possible to achieve the perfection here.

melezhik commented 4 years ago

We should definitely strive towards more platforms being tested, but it's probably not possible to achieve the perfection here.

Establish a complete list of platforms and tools we Officialy Met Expectations for and add a clear CI check for every missing point of this list.

It's relatively easy with RakuDist. It has pluggable design in mind where spining up a new docker image and adding it to the system is not a big deal. So far it does tests for:

alpine
debian
centos
ubuntu

We could add more images to the list ( including old CentOS and any exotic that docker support, Sparrow is super flexible as well do deal with such a variety )

The question is what kind of test do we want to ensure another commit to MoarWM/rakudo does not break stuff. If one tells me what kind of test we need here I could start implementing this on RakuDist side.

Current test workflow is that:

download Rakudo from whateverable
run zef install for a certain module

But yes, we can support more testing scenarios, including building Rakudo/MoarWM from source, whatever ...

melezhik commented 4 years ago

Further thoughts. I am thinking about pottage.raku.org - a service for Rakudo/MoarVM smoke testing for various Linuxes, so that for every commit we:

build moarvm
build rakudo
run lightweight tests to ensure OS compatibly

Every test should run for every OS/distro in the list and should not take more say 5-10 minutes, so we could catch architecture/platform dependent bugs as soon as possible and report any problems as quick as possible.

I have all the components in place (UI/Job runner - Sparky/Backend and CM tool - docker + Sparrow ) - similar to RakuDist, so there is good base to start with ...

AlexDaniel commented 4 years ago

Further thoughts. I am thinking about pottage.raku.org - a service for Rakudo/MoarVM smoke testing for various Linuxes, so that for every commit we:
* build moarvm

* build rakudo

* run lightweight tests  to ensure OS compatibly

Isn't it exactly what we do with current CI setups?

melezhik commented 4 years ago

@AlexDaniel I don't know, maybe. The idea to do it for as many OS/distros/Envs as possible. Is that the case now?

rba commented 4 years ago

AFAIK the overhaul of the build pipelines using Azure CI by @patrickbkr die cover the build of moarvm and rakudo and some testing.

I would like to extend it to automate the star release in the future too.

niner commented 4 years ago

We don't need more CI tests. We don't need more target platforms.

What we need is reliable CI tests. They get ignored because most of the time we look at those results is wasted because it was yet another false positive. We've had Travis reporting to the IRC channel and we'd jump whenever it reported failure. But most of them were because of Travis not being able to clone a repository or other benign issues. So someone wrote an IRC bot to tell us whether a failure was a false positive or not but that vanished, too.

What we also need is more cooperation on existing CI infrastructure. We've had Travis for Linux and OS X and AppVeyor for Windows. Someone was dissatisfied because of some issues and we got CircleCI. So we've had Travis, AppVeyor and CirecleCI all with their issues. I've added https://build.opensuse.org/project/show/home:niner9:rakudo-git because I wanted coverage for issues that would appear only in packaged versions and also coverage of important modules. This regularily reports failures like t/nqp/111-spawnprocasync.t (Wstat: 6 Tests: 4 Failed: 0) and got ignored completely and instead we got this Azure Pipelines thing.

Really, please stop adding additional systems.

Instead, just make reports worth getting looked at then make it so we don't have to check 5 different websites with different user interfaces to get at the results and then start looking at the reports, point at broken commits and create reduced test cases.

Altai-man commented 4 years ago

@lizmat

So far, I have seen waay more false positives from CI than I have seen false negatives. It's the false positives (when CI says there's something wrong, and it's the CI that is wrong)

Then we have to eliminate false positives instead of ignoring the checks. I mean, false positives did happen, but somehow other communities do use CI, likely preventing problems with releases and broken master for weeks as we have. There are flappers in roast but nobody pushes the code without running it saying "There are flappers, so I won't" (or so I hope). :)

@AlexDaniel

I agree with this, but technically it also means running Blin on all of these platforms

Still better than suddenly breaking someone's code in the wild because we don't check. Working on a language as we do is full of Technical Difficulties anyway, this is one of them, I guess.

Did you know we support mipsel?

I had no idea and this is precisely the problem. Tomorrow someone's code on mipsel will break next release and we will be "Hmm, well, it is not stated anywhere, but I guess we kind of support that, let's patch and release again". I am not talking about perfection and I am sure suggested scheme won't eliminate point releases. However, I don't see how catching issues earlier can be seen as something wrong.

Even a wiki page stating explicitly "We support this, that, this and that" will help tremendously.

@rba

AFAIK the overhaul of the build pipelines using Azure CI by @patrickbkr die cover the build of moarvm and rakudo and some testing

Yes, this is an awesome piece of work, because we got testing for JVM and eventually relocability too. It's just that older gcc were not on the plate, which is hopefully fix-able. Then we can have one system to rule them all and given rakudo is reliably enough to not torture us with races, we would have a great tool in our toolbox.

@niner

Really, please stop adding additional systems.

We won't (I hope). More so, migration to Azure eliminated usages of Travis and AppVeyor, quite successfully, making the things easier.

Instead, just make reports worth getting looked at then make it so we don't have to check 5 different websites with different user interfaces to get at the results and then start looking at the reports, point at broken commits and create reduced test cases.

Yes. The intention here is to 1)make CI worth being respected; 2)make people see merits of it..

Saying "Current CI is bad, so one wouldn't want to use it" is odd compared to "Current CI is bad, so we should improve it and use it".

melezhik commented 4 years ago

I agree with this, but technically it also means running Blin on all of these platforms do we run Blin for ALL modules in eco system? So it takes hours for one run, does it?

My idea is to have lightweight smoke tests run ( with average run time no more 5-10 minutes ) for all supported platforms per every commit. Is it a case now for any of the mentioned CI (Azure Devops/Circle/Travis )?

Altai-man commented 4 years ago

TL;DR: 1)I don't want to cover everything in the world, add platforms, add tools. I want us to clarify what we support and what not currently and make our current tools to check if the release is worthy using this checklist. 2)I don't want to spend precious time of developers more than required, so our CI should be healthy and it should be the means for avoiding breakage of master (which is not so uncommon right now as was shown in the examples).

lizmat commented 4 years ago

Flappers are acceptable red flags for me. Stupid things like connectivity issues breaking builds, are not. :-) And I've looked at way too many of those.

melezhik commented 4 years ago

@Altai-man I understand all that, and I agree with all you've said. But I still need some clarification here (from you or from others), for example you said "a critical bug which lead to build failures when linking against musl, glibc alternative notably used by Alpine distro popular for CI containers"

So, do we have a test where check a source code compatibility with Alpine? So on so forth ( you can think of some other examples, say some CentOS distros that we claim to support ).

melezhik commented 4 years ago

I've found a set of OS supported in Azure Pipelienes CI for Rakudo build

https://github.com/rakudo/rakudo/blob/master/azure-pipelines.yml#L41-L97

Don't see an Alpine/CentOS/Debian here

The same for moarvm - https://github.com/MoarVM/MoarVM/blob/master/azure-pipelines.yml

cc @patrickbkr

I am not picking holes 😄 , it probably works fine for the purpose of testing moar backend / rakudo in general. But probably does not cover some OS dependent issues mentioned here ...

patrickbkr commented 4 years ago

To start the discussion of what platforms we want to have automated tests for, I have put together a list. This is open for discussion.

x86_64 Windows 10 (actually Microsoft Windows Server 2019 Datacenter, link)
- [x] Java 11
- [x] Visual Studio 2019
x86_64 Ubuntu 18.04 (link)
- [x] GCC 9
- [x] Clang 9
x86_64 MacOS 10.15 (link)
- [x] GCC 9
- [x] Java 11
x86_64 Docker CentOS 6
- [ ] GCC 4
x86_64 Docker Alpine latest
- [ ] GCC latest (currently 9)
[ ] Some big endian system
- The platform we actually want to support here is IBM System z. No chance we get our hands on one of those.
- SPARCs are really cheap on ebay (e.g. Sparc T3 CPU 16 Core 1,65Ghz 32GB RAM - 235 €)
- I'd hope we are able to get Debian on this. People are familiar with Debian in contrast to say Solaris. Debian has an unofficial SPARC64 port.
- Getting the AzureCI runner working on one of those systems might be challenging.
[ ] Some ARM system
- I propose a RasPi 4 with 4GB RAM, Debian armhf 32bits - such builds will run on Raspbian
- We could also add another RasPi with arm64.
- RasPi is an officially supported platform for the AzureCI runner.

Open questions with the above list:

Did I miss an environment we want to support?
Do we really want to start setting up our own hardware for testing? Who is going to host / pay the rack space and electricity? @rba ?
Who would pay for the hardware?
Which test would we want to run? Currently such platformy tests (apart from OS) are only present in the MoarVM CI setup. Thus only NQP tests are run. Will this suffice to make sure our stack works on these OSes? Or do we want a full rakudo test?

lizmat commented 4 years ago

MacOS 11 with ARM processor as soon as there is one available?

patrickbkr commented 4 years ago

Also I do agree with niner and lizmat that our biggest problem with CI currently is reliability. We need to have a stable CI that people are willing to not ignore.

In that regard I'd like to focus on Azure and get rid of the others. Currently Azure isn't fully reliable (see this comment). I hope we'll manage to iron these failures out. - Soonish.

niner commented 4 years ago

[ ] Some big endian system

The platform we actually want to support here is IBM System z. No chance we get our hands on one of those.

That is simply not true. As I've pointed out repeatedly, the Open Build Service supports a long list of platforms out of the box on its 12000 machine strong build farm. This includes openSUSE Factory zSystems. It was literally 3 mouse clicks to activate the build an System z and if you're interested in the results, they are right here:

https://build.opensuse.org/package/live_build_log/home:niner9:rakudo-git/ moarvm/openSUSE_Factory_zSystems/s390x

Why this gets ignored is completely beyond me.

To make it absolutely crystal clear, this is the full list of currently available buid targets of the Open Build Service:

openSUSE Tumbleweed

openSUSE Leap 15.2 openSUSE Leap 15.1 openSUSE Leap 15.1 ARM openSUSE Leap 15.1 PowerPC openSUSE Factory ARM openSUSE Factory PowerPC openSUSE Factory zSystems openSUSE Backports for SLE 15 SP1 openSUSE Backports for SLE 15 openSUSE Backports for SLE 12 SP5 openSUSE Backports for SLE 12 SP4 openSUSE Backports for SLE 12 SP3 openSUSE Backports for SLE 12 SP2 openSUSE Backports for SLE 12 SP1 openSUSE Backports for SLE 12 SP0

SUSE SLE-15-SP1 SUSE SLE-15 SUSE SLE-12-SP5 SUSE SLE-12-SP4 SUSE SLE-12-SP3 SUSE SLE-12-SP2 SUSE SLE-12-SP1 SUSE SLE-12 SUSE SLE-11 SP 4 SUSE SLE-10

Arch Extra Arch Community

Raspbian 10 Raspbian 9.0

Debian Unstable Debian Testing Debian 10 Debian 9.0 Debian 8.0 Debian 7.0

Fedora Rawhide (unstable) Fedora 32 Fedora 31 Fedora 30 Fedora 29

ScientificLinux 7 ScientificLinux 6

RedHat RHEL-7 RedHat RHEL-6 RedHat RHEL-5

CentOS CentOS-8-Stream CentOS CentOS-8 CentOS CentOS-7 CentOS CentOS-6

Ubuntu 20.04 Ubuntu 19.10 Ubuntu 19.04 Ubuntu 18.04 Ubuntu 16.04 Ubuntu 14.04

Univention UCS 4.4 Univention UCS 4.3 Univention UCS 4.2 Univention UCS 4.1 Univention UCS 4.0 Univention UCS 3.2

Mageia Cauldron (unstable) Mageia 7 Mageia 6 IBM PowerKVM 3.1 AppImage

KIWI image build (to be used for appliance and product builds with kiwi)

patrickbkr commented 4 years ago

@niner From my understanding OBS is a build service and not a CI service. Did I misunderstand? Is it viable to try to use OBS as a CI?

niner commented 4 years ago

On Donnerstag, 25. Juni 2020 13:47:24 CEST Patrick Böker wrote:

@niner From my understanding OBS is a build service and not a CI service. Did I misunderstand?

Yes

Is it viable to try to use OBS as a CI?

Yes. I've explicitly cleared this with the OBS folks at FOSDEM and have been using it as CI service since January.

patrickbkr commented 4 years ago

Judging by the above list the OBS could be used as a CI and build platform for about everything except MacOS and Windows.

@niner It seems OBS doesn't really market itself as a CI. The user documentation has near to nothing on the topic of using it as such. I suspect one has to bend the system into being a CI a bit. Am I right? Things I didn't find any information about:

Building PRs
Reporting results back to GitHub
Viewing testresults ordered by commit in OBS itself

There is a 2013 talk by Ralf Dannert mentioning a Jenkins integration, but information on that is just as sparse.

Edit: I am interested in looking into this more. I'd really appreciate some more information on the topic though.

niner commented 4 years ago

On Donnerstag, 25. Juni 2020 14:55:31 CEST Patrick Böker wrote:

@niner It seems OBS doesn't really market itself as a CI. The user documentation has near to nothing on the topic of using it as such. I suspect one has to bend the system into being a CI a bit. Am I right?

I'm using a cron job and a modified version of my packaging scripts to push every commit to MoarVM, nqp and rakudo to the OBS for testing. The OBS will then rebuild those 3 and the 21 modules (most importantly Inline::Perl5 and Cro) I'm most interested in.

There's also something where you can point it at a git repo, but since I already had working scripts, I didn't dig into this:

https://openbuildservice.org/help/manuals/obs-user-guide/cha.obs.best-practices.scm_integration.html

https://openbuildservice.org/help/manuals/obs-user-guide/ cha.obs.source_service.html

Advantages I see there are the flexibility of being able to build pretty much whatever I want, including PRs and branches and even patched versions, that the OBS takes care of dependencies, i.e. building stuff in order and that the build itself doesn't have to do any git operations and doesn't access the network at all. That means that builds simply cannot fail due to some git host not answering.

Building PRs

Reporting results back to GitHub

The OBS has both a pretty decent command line client and a REST API:

nine@ns1:~/home:niner9:rakudo-git/moarvm> osc results --vertical
openSUSE_Tumbleweed  i586       succeeded
openSUSE_Tumbleweed  x86_64     succeeded
openSUSE_Leap_15.2   x86_64     succeeded
openSUSE_Leap_15.1   x86_64     succeeded
openSUSE_Factory_zSystems s390x      failed
openSUSE_Factory_PowerPC ppc64      succeeded
openSUSE_Factory_PowerPC ppc64le    succeeded
openSUSE_Factory_ARM armv7l     succeeded
openSUSE_Factory_ARM aarch64    succeeded
nine@ns1:~/home:niner9:rakudo-git/moarvm> osc results --xml
<resultlist state="b7680636458e1e15dfa277cb5c133ee5">
  <result project="home:niner9:rakudo-git" repository="openSUSE_Tumbleweed" 
arch="i586" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" repository="openSUSE_Tumbleweed" 
arch="x86_64" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" repository="openSUSE_Leap_15.2" 
arch="x86_64" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" repository="openSUSE_Leap_15.1" 
arch="x86_64" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" 
repository="openSUSE_Factory_zSystems" arch="s390x" code="published" 
state="published">
    <status package="moarvm" code="failed"/>
  </result>
  <result project="home:niner9:rakudo-git" 
repository="openSUSE_Factory_PowerPC" arch="ppc64" code="published" 
state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" 
repository="openSUSE_Factory_PowerPC" arch="ppc64le" code="published" 
state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" repository="openSUSE_Factory_ARM" 
arch="armv7l" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
  <result project="home:niner9:rakudo-git" repository="openSUSE_Factory_ARM" 
arch="aarch64" code="published" state="published">
    <status package="moarvm" code="succeeded"/>
  </result>
</resultlist>

nine@sphinx:~> lwp-request https://api.opensuse.org/build/home:niner9:rakudo-git/openSUSE_Tumbleweed/x86_64/rakudo/_status
Enter username for Use your SUSE developer account at api.opensuse.org:443: 
niner9
Password: 
<status package="rakudo" code="succeeded">
  <details></details>
</status>

AlexDaniel commented 4 years ago

Please, everyone, take a serious look at OBS. It makes a lot of our crutches obsolete. In fact, even Blin will probably be rendered useless once we have all modules packaged in OBS.

melezhik commented 4 years ago

once we have all modules packaged in OBS.

It's not like I am against OBS or other tools, but packaging Raku modules into native packages is out of the scope for the issue being discussed here.

AlexDaniel commented 4 years ago

@melezhik it's not out of scope, that's just how OBS works. If we create a new rakudo package for every commit, it can trigger a rebuild of all module packages (on all architectures). That's essentially what Blin does, except that OBS can do it for all supported architectures without requiring us to create our own infrastructure for it. It actually sounds a bit too good to be true, but according to @niner we are allowed to do something like this, so let's try it.

nxadm commented 4 years ago

When I was looking on how to create rakudo packages, OBS was the first thing I looked at. Huge platform support, backed by a FOSS company, etc. However, I found it very complicated.

This does not mean I think OBS is a bad idea. In fact, I prefer it to Microsoft Azure. I am just stating the importance of documentation and transfer of knowledge, because I suspect that @niner++ is the only expert of the platform.

melezhik commented 4 years ago

@AlexDaniel I see what you're saying and with all respect what @niner has been doing with OSB, just my thoughts:

If we create a new rakudo package for every commit, it can trigger a rebuild of all module packages (on all architectures).

you don't need this to test Moar/Rakudo, it's only makes a sense if one is going to support Raku modules for certain platform

OBS can do it for all supported architectures without requiring us to create our own infrastructure for it.

Let's be real. There is no such a tool that automatically generates all platform specific packages from META specs. Even though there is AFIK progress in that way with rhel/centos presented by @niner , we should understand that it's way too harder the we could expect now, it's even hard to to it for a certain platform, there are too many bumps on road we might be not aware of now. Again do we still need it? If we are going to maintain native packages for different Linux, then it makes a sense. However I personally don't want to build a native CentOS package for Rakudo package just to test it ... But there is somewhat in the middle approach I am currently working on discussed here one might be interested ...

AlexDaniel commented 4 years ago

There is no such a tool that automatically generates all platform specific packages from META specs

I'll submit PRs to all modules that need native dependencies. No problem.

niner commented 4 years ago

On Donnerstag, 25. Juni 2020 18:46:34 CEST Alexey Melezhik wrote:

Let's be real. There is no such a tool that automatically generates all platform specific packages from META specs.

There's no such tool yet. But we're closer than ever.

Even though there is AFIK progress in that way with rhel/centos presented by @niner , we should understand that it's way too harder the we could expect now, it's even hard to to it for a certain platform, there are too many bumps on road we might be not aware of now.

I have thought about and worked on this for at least 5 years now. I think by now I know the way.

Again do we still need it? If we are going to maintain native packages for different Linux, then it makes a sense. However I personally don't want to build a native CentOS package for Rakudo package just to test it ...

That's the point. Getting native dependency information into META6.json files gives us the base for all of these:

fully automated checking of dependencies by zef
automated installation of native dependencies by zef
fully automated packaging of all Raku modules for Linux distributions
fully automated CI for all Raku modules

Really, the only part that we haven't actually specified and that we need for module CI is actually how to run tests. But there is a sort of quasi standard with t directories and scripts outputting TAP, which is what zef runs, so that's pretty much covered.

AlexDaniel commented 4 years ago

@niner++ I love your work.

melezhik commented 4 years ago

I guess almost if not all issues the discussion has started with have nothing to do with Raku modules native packages. How having those packages would help us?

AlexDaniel commented 4 years ago

@melezhik yes, you're actually right, but you have to consider the big picture. OBS can allow us massive testing of everything on all architectures. It does potentially fix some of the specific points raised in this ticket (alpine issue, stability of CI, old gcc stuff), but for others (relocatability, windows) we will have to do something extra.

niner commented 4 years ago

Aaaand we now successfully build on s390x, i.e. IBM System z :) https://build.opensuse.org/project/show/home:niner9:rakudo-git

patrickbkr commented 4 years ago

I'm having a hard time diving into OBS with respect to setting up a MoarVM, NQP and Rakudo CI.

@niner: As you have the deepest understanding of OBS: Can you create a write up of how a CI integration with OBS could look like for us? Some guide that gives the large picture of how such a CI should work and roughly what chunks of work need to be tackled. - An actionable plan. I imagine this would help us "OBS outsiders" a lot.

melezhik commented 4 years ago

just an idea. abit another dimension in this discussion. we can spin up amazon instances ( on my free tier account this is even free ) on demand using terraform, then run tests and tear instances down. it's cheap and efficient approach. just a thought.

Sparrowdo/Sparky has recently started to support such a dynamic configurations ....

niner commented 4 years ago

The Open Build Service was created as the name suggests to build software and that's first and foremost what it still does. As testing can be considered a part of the build process that's already the most important bit covered. The thing that's missing is telling the OBS that there's something to build and what exactly.

The OBS supports several target systems and distributions and uses the native tools to build packages. For distributions like openSUSE or CentOS, the native tool is rpmbuild. On Debian based systems it's dpkg or dpkg-buildpackage to be precise. These tools need the software's sources and descriptions of how to build them.

For rpmbuild, the sources usually come in the form of tar balls but actually can be any collection of files. The build description is given as SPEC file [1]. This contains meta data like name of the package, version, dependency information and license and the list of source files (usually tar balls and patches). In addition there are sections for the actual build steps, a list of files the result package contains and a changelog.

So to build a Raku module on openSUSE on the OBS one needs to create a spec file first with the described information. Luckily, our META6.json files already contain everything one would need for that. A spec file can easily created for that and there's already a meta2rpm [2] tool that does this in a fully automated way. I used this tool to create the spec files for the modules that get built for every rakudo commit in my rakudo-git OBS project [3]. There's also a rather rudimentary script for uploading the resulting tar ball and spec files to the OBS.

For a fully automated meta2rpm and equally automated build we still need information about dependencies on native libraries and programs. Me, ugexe and tony-o have been working on this for a couple of years and we're as good as there. The META6.json format supports native dependencies and meta2rpm can translate them to spec file "Requires" lines. The caveat is that currently the native library must already be installed on the machine that meta2rpm runs on. It needs to be extended to query the distro's online repository, which is really just a matter of programming. Note that handling native dependencies is yet an open issue for all possible build systems.

There is still a need for tools similar to meta2rpm (or a generalization of this tool) to get build descriptions for other target distributions like Debian or Arch. That's a relatively simple matter of someone finding out (or already knowing) what those descriptions look like and implementing a writer.

The final piece of the puzzle is triggering a new build. For the rakudo-git repository I have a simple systemd timer (what used to be cron jobs) that runs an ugly shell script [4] derived from the one I use for packaging. The script checks if there are new commits in the repository and if so, create a tar ball that follows the name-version.tar.xz convention, updates the version in the .spec file and even turns the git log into a changelog (which isn't necessary per se but I already had the code and it's nice to have). A simple osc commit -m "Update to version $version" pushes the result to the OBS which starts building everything.

The OBS has a thing called source services [5] which may replace the timer and shell script, but I haven't tried it yet.

[1] https://rpm-packaging-guide.github.io/ [2] https://github.com/niner/meta2rpm/ [3] https://build.opensuse.org/project/show/home:niner9:rakudo-git [4] https://gist.github.com/niner/98f26b41f9f7a5f79cb8b9b8c4b6048d [5] https://openbuildservice.org/help/manuals/obs-user-guide/ cha.obs.source_service.html

patrickbkr commented 4 years ago

@niner Thanks for the overview! Your writeup even covers testing Raku modules! You didn't mention GitHub integration. From a quick search I came up with a Github guide to build your own CI server integration but didn't find any information of a preexisting hook for OBS. My guess is there isn't any and we need to build our own. Do you know otherwise?

You currently use the pull paradigm (OBS regularly looks for new commits). The GitHub guide suggests using the push paradigm (GitHub pushes change notifications to a CI backend). But I suspect the GitHub API is flexible enough to get it to work in a pull fashion as well. Would you recommend one over the other?

niner commented 4 years ago

Do you know otherwise?

Sorry, I don't. No idea how those status thingies get onto GitHub pages.

You currently use the pull paradigm (OBS regularly looks for new commits). The GitHub guide suggests using the push paradigm (GitHub pushes change notifications to a CI backend). But I suspect the GitHub API is flexible enough to get it to work in a pull fashion as well. Would you recommend one over the other?

Maybe one can use GitHub push in combination with the source services. Long term, push is certainly preferrable as it reduces latency and won't cause issues with request limits or something like that. Whipping up a little service that takes GitHub notifications and triggers an OBS update sounds like a nice little exercise as we already have code for handling GitHub notifications in https://github.com/Raku/geth

JJ commented 4 years ago

El lun., 13 jul. 2020 a las 18:32, niner (notifications@github.com) escribió:

Do you know otherwise?

Sorry, I don't. No idea how those status thingies get onto GitHub pages.

Status thingies can be created as badges; they are also available from the API. Don't know if they could be integrated.

enough to get it to work in a pull fashion as well. Would you recommend one

over the other?

If you self-host it, you can do whatever you want. The current GitHub actions framework allows periodic actions, or push actions. You can make periodic actions pull stuff, of course.

patrickbkr commented 4 years ago

I want to work on creating a CI setup based on AzureCI to cover MacOS and Windows, and OBS for Linux. I have not yet created a detailed concept. From what I understood it's impossible to directly couple OBS and GitHub. Some intermediary that watches for changes in GitHub or is notified by GitHub on changes and pushes those changes to OBS will be necessary. I'm considering using that same intermedary to trigger Azure builds. But I'll decide that later on once I have some progress attaching OBS.

But this ticket is not about fixing up our CI, but about stability with our development process in general. @Altai-man has provided a possible solution in the very first comment of this ticket. I'd like to reignite the discussion with respect to his proposal.

What is the stance of the core committers towards the idea of a only pull requests development model? I've read some positive voices. Are there objections?
I think "Failure on master is an extreme condition that needs to be resolved prio 1." has a very large cultural aspect. Changing that culture won't be done by simply stating so in some ticket. What can we do to support the change?
- One point that has been brought up is stability of our CI. Especially false positives are discouraging respecting the CI.

patrickbkr commented 4 years ago

A PR-only-workflow could be made easier by a helper bot.

We introduce a tag merge-on-green or ready-for-merge or some such. The bot will find PRs with that tag, wait for CI to finish and merge if the tests are all green or remove the tag and ping the creator if they turn out red. This way one doesn't have to touch each PR twice, just create the PR and add the tag in one step.

patrickbkr commented 4 years ago

The bot could also be integrated into the CI and then be able to do things like retriggering the CI on command.

Prior art: Redhat seems to have a bot supported PR workflow: https://github.com/kubernetes/community/blob/master/contributors/guide/owners.md#the-code-review-process

MasterDuke17 commented 4 years ago

I believe Rust also does a lot with bots interacting with PRs.

Sent from my iPhone

On Jul 22, 2020, at 8:35 AM, Patrick Böker notifications@github.com wrote:

The bot could also be integrated into the CI and then be able to do things like retriggering the CI on command.

Prior art: Redhat seems to have a bot supported PR workflow: https://github.com/kubernetes/community/blob/master/contributors/guide/owners.md#the-code-review-process

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

AlexDaniel commented 4 years ago

If it matters, I like the idea of PR-based development. That's one of the reasons I added support for branches to whateverable. You can use committable to run code on any branch to show the upcoming behavior, you can use bisectable if needed, etc. Blin also supports branches so you can pretest your commits on the ecosystem to see if anything breaks.

Altai-man commented 4 years ago

Establish a complete list of platforms and tools we Officialy Met Expectations for and add a clear CI check for every missing point of this list.

So after a month and a half of discussions we still don't have one and people are asking about it.

patrickbkr commented 4 years ago

@Altai-man: I think the problem with writing such a list now (instead of later), is that we in principle want to support as many platforms as possible and the limitting factor is having a good CI and build toolchain for the platforms. So it's currently mostly not a question of what we want to support or what is sensible to support, but what we technically can manage to have a CI for.

So the best we can do at the moment is listing which platforms we currently support well. This is what we have:

Configurations we test via our CI (only looking at Azure, ignoring Travis and CircleCI):

moar, non-reloc|reloc, x86-64, Windows 10,   MSVC,                dyncall
moar, non-reloc|reloc, x86-64, MacOS 10.15,  clang 10.0,          dyncall
moar, non-reloc|reloc, x86-64, Ubuntu 18.04, gcc 7.3.0|clang 6.0, dyncall|libffi, glibc-2.27

Platforms without CI, but for which releases are built:

moar, reloc,           x86-64, CentOS 6,     gcc 4.4.7,           dyncall,        glibc-2.12

I think with the above setup we can claim to support:

MoarVM
on x86-64
Windows 10
- MSVC
MacOS
- clang
Linux
- glibc >= 2.12
- gcc >= 4 and clang

We can improve our coverage for the above systems (e.g. an older MacOS version, CentOS 6), but above is what we have now.

I am working on improving our CI infrastructure. I hope to be able to integrate OBS as a CI system. If I am successful we have the possibility of adding more platforms to the list.

Do we somehow have to give the above list our blessing? Where could the list live? Somewhere in the Rakudo doc/ folder?

melezhik commented 4 years ago

If it helps, Rakudist now runs community modules tests for a variety of Rakudo versions (whateverable) for the following platforms:

Ubuntu
Debian
Centos
Alpine

Again, it's just a matter of spinning up a new docker image into the RakuDist, to get a new OS tested ...

melezhik commented 3 years ago

I have recently started a tool aiming, at least partly mitigate the issues mentioned here. If someone is interested - https://github.com/melezhik/r3tool

Raku / problem-solving

Current Rakudo (possibly MoarVM as well) development process hinders releasing #206

Possible solutions