Closed wohali closed 3 years ago
Hi @afsanjar , I have moved this to a new issue, this is not related to CI failing on main
.
I'm hoping you can work with someone on the IBM Cloudant team such as @nickva to help coordinate this effort.
We lost our access to CI workers on or around 2020.09.01, when IBM Cloud discontinued the workers we had been using.
Previously we had 2 SoftLayer worker nodes in the "cp2-4x8" profile. Here's their old definitions in our infrastructure management repo (Ansible based): https://github.com/apache/couchdb-infra-cm/commit/ee7206c6124a725624187570193d6b52e6f5a9b6
They would ahve been running Ubuntu, probably 18.04, and had Docker installed. Note that our builds run inside Docker, but we require dedicated services at the moment, and cannot support pure Docker-based Jenkins workers.
Can you work backwards from this or do you need us to dig out what "cp2-4x8" used to mean for IBM Cloud/SoftLayer?
Hi @afsanjar, building on what Joan shared, we were using worker nodes from the "VPC on POWER" program, so I guess these were POWER9 instances with 4 vCPUs and 8 GB RAM.
Most of our other workers nodes still run in an IBM Cloud VPC, but the master is a CloudBees instance hosted by the ASF at https://ci-couchdb.apache.org, so yes, the worker nodes will need public network access.
Hi Amir. The repository which manages the workers is https://github.com/apache/couchdb-infra-cm. It uses Ansible and there is a discovery API call which looks through and gathers all the available instances. Ideally the new instances would be query-able in a similar manner. But, if needed, we can add another query there to get the ppc64le workers.
Ping me internally at IBM if you need help. Search for "Nick" (or "Nicolae") Vatamaniuc, and I should have access to create and update these worker instances.
Hi Adam and Joan, Thanks for your replies. We are working toward providing a dedicated build/test HW for ppc64le asap. Meanwhile, what are other outstanding issues for ppc64le enablements?
On Tue, Mar 23, 2021 at 6:44 PM Adam Kocoloski @.***> wrote:
Hi @afsanjar https://github.com/afsanjar, building on what Joan shared, we were using worker nodes from the "VPC on POWER" program, so I guess these were POWER9 instances with 4 vCPUs and 8 GB RAM.
Most of our other workers nodes still run in an IBM Cloud VPC, but the master is a CloudBees instance hosted by the ASF at https://ci-couchdb.apache.org, so yes, the worker nodes will need public network access.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/couchdb/issues/3454#issuecomment-805353503, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3DPVDR6KNZMF27JXG72IDTFERU3ANCNFSM4ZWBGD4A .
There are no known regressions, but with the significant changes to our code base made over the last 6 months, we are in no position to re-add ppc64le to the supported platform list until CI has been running for at least a few iterations with no failures on the platform.
Yes, of course. Trust me Joan it is not in our interest to have a defective product as well. My intend was to size the effort so I could allocate enough resources. Btw, I am an Apache Bigtop PMC member, what is the process to join as a contributor and committer?
On Tue, Mar 23, 2021 at 10:40 PM Joan Touzet @.***> wrote:
There are no known regressions, but with the significant changes to our code base made over the last 6 months, we are in no position to re-add ppc64le to the supported platform list until CI has been running for at least a few iterations with no failures on the platform.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/couchdb/issues/3454#issuecomment-805462778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3DPVFIPSUXZZQY77VZZBLTFFNJLANCNFSM4ZWBGD4A .
Good to hear. Thanks for your help! Really, it makes a huge difference. I only wish this help had come before I had to take the step to remove ppc64le from our supported build list.
You're already a contributor (from our bylaws):
A contributor is someone who makes contributions to the community, project, documentation, or code.
There is no special requirement to become a contributor. If you have a great idea for the project, you can get to work immediately. There is no need to ask permission. Most things can be accomplished by contributors with no special privileges or status on the project. Assistance can be provided if you need access to project resources to get your work done.
A contributor who makes sustained contributions to the project may be invited to become a committer.
To become a committer is not an overnight thing. Of course, given sustained commitment to the project, the PMC generally invites people who stand out as meeting these criteria:
A committer is someone who is committed to the project. In return for their commitment, they are given a binding vote in certain project decisions. Committers are hence responsible for the ongoing health of the project and the community.
We recognise commitment in many different areas. These include, but are not limited to:
- Community (community management, ticket triage, helping new users, events etc.)
- Project (blogging, marketing, design, UX, branding, etc)
- Documentation (documentation, localisation/internationalisation, etc.)
- Code (new features, bug fixing, quality assurance, release management etc.)
... To make this clear, we have chosen to define a committer as someone who is committed. We mean this in the sense of being loyal to the project and its interests. It is a position of trust, not an expectation of activity level. Anyone who is supportive of the community and the project will be considered as a candidate for being a committer.
As a matter of convenience, committers are also given write access to all of the public project infrastructure, including source control repositories, website, issue tracker, wiki, and blog. Access to social media accounts, and other third-party services, will be granted upon request.
Thanks again for your help!
I would say that, 3.x aside, the 4.x release cycle will eventually come, and we'll need ppc-savvy resources on hand to help deal with any problems found at that time. Your team's support is most welcome when that happens.
Thanks a lot, Nick. We'll definitely reach out to you in the next few days for a chat. My internal IBM handle is "@Amir Sanjar" and email is @.***
On Tue, Mar 23, 2021 at 9:53 PM Nick Vatamaniuc @.***> wrote:
Hi Amir. The repository which manages the workers is https://github.com/apache/couchdb-infra-cm. It uses Ansible and there is a discovery API call https://github.com/apache/couchdb-infra-cm/blob/main/tools/gen-config#L71-L86 which looks through and gathers all the available instances. Ideally the new instances would be query-able in a similar manner. But, if needed, we can add another query there to get the ppc64le workers.
Ping me internally at IBM if you need help. Search for "Nick" (or "Nicolae") Vatamaniuc, and I should have access to create and update these worker instances.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/couchdb/issues/3454#issuecomment-805444943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3DPVBRIDOCG6YI4GGNISDTFFHZRANCNFSM4ZWBGD4A .
We are trying to provide the community OpenStack direct access to create more Power VMs as needed. Would that work?
I'll let @nickva answer, but as long as the machines don't disappear out from under us randomly, or we're not forced to re-set up machines on a regular basis it should be fine. Our current Jenkins system takes a bit of effort to add and maintain new machines, so, dynamic (re-)allocation isn't really a great option for us.
Got that.
@afsanjar It would be ideal it the ppc nodes could be added again to the existing (functional) cloud.ibm.com account we're already using. The Ansible script would make it easy to manage all the nodes together.
@nickva We are ultimately moving that way as part of the IBM Cloud PowerVS project https://www.ibm.com/cloud/power-virtual-server. However, currently, the Linux support is still under development, i.e., no ubuntu support yet.
Update: @nickva and I have finalized a plan to enable Power CI/CD without any change to the existing CI/CD structure. We will address the last hurdle tomorrow.
On Thu, Mar 25, 2021 at 3:22 PM Nick Vatamaniuc @.***> wrote:
@afsanjar https://github.com/afsanjar It would be ideal it the ppc nodes could be added again to the existing (functional) cloud.ibm.com account we're already using. The Ansible script would make it easy to manage all the nodes together.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/couchdb/issues/3454#issuecomment-807416688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3DPVECOQKDW2W3ZPZBAG3TFOLPJANCNFSM4ZWBGD4A .
Working with @afsanjar and other members of the Power team we have a new PowerVS machine connecting to our Apache CouchDB Jenkins instance.
It's also added to the couchdb-infra-cm so it can be discovered and set-up that way (get the external IPs, run playbooks on it, etc). It's a bit of a special case still as cloud.ibm.com
API for PowerVS instances is quite a bit different than for the other VPC instances.
A few issues encountered currently are:
qemu cross platform architecture setup in couchdb-ci crashes with ppc64le build fails when building on MacOS BigSur (x86_64):
CONTAINERARCH=ppc64le ERLANGVERSION=21.3.8.17 ./build.sh platform debian-buster
=> # qemu: uncaught target signal 4 (Illegal instruction) - core dumped
I tried building on the machine itself and it looks like fails to build foundationdb. We have a setting to ignore missing intrinsics, however it later on stumbles on a missing __rdtsc instruction:
apt-dependencies.sh in couchdb-ci repo:
if [ "${ARCH}" == "ppc64le" ]; then cmake -DCMAKE_CXX_FLAGS="-DNO_WARN_X86_INTRINSICS" -G Ninja ..
[108/751] Building CXX object flow/CMakeFiles/flow.dir/Net2.actor.g.cpp.o FAILED: flow/CMakeFiles/flow.dir/Net2.actor.g.cpp.o /usr/bin/c++ -DBOOST_ERROR_CODE_HEADER_ONLY -DBOOST_SYSTEM_NO_DEPRECATED -DHAS_ALIGNED_ALLOC -DNO_INTELLISENSE -DUSE_UCONTEXT -I../ -I. -isystem boostProject-prefix/src/boostProject -DNO_WARN_X86_INTRINSICS -O3 -DNDEBUG -DCMAKE_BUILD -ggdb -fno-omit-frame-pointer -Wno-pragmas -Wno-attributes -Wno-error=format -Wunused-variable -Wno-deprecated -fvisibility=hidden -Wreturn-type -fPIC -DHAVE_OPENSSL -std=gnu++17 -MD -MT flow/CMakeFiles/flow.dir/Net2.actor.g.cpp.o -MF flow/CMakeFiles/flow.dir/Net2.actor.g.cpp.o.d -o flow/CMakeFiles/flow.dir/Net2.actor.g.cpp.o -c flow/Net2.actor.g.cpp In file included from /foundationdb/flow/Net2.actor.cpp:21: /foundationdb/flow/Net2.actor.cpp: In member function 'virtual void N2::Net2::run()': ../flow/Platform.h:438:28: error: '__rdtsc' was not declared in this scope
^~~~~~~
/foundationdb/flow/Net2.actor.cpp:1186:15: note: in expansion of macro 'timestampCounter'
tscBegin = timestampCounter();
^~~~
../flow/Platform.h:438:28: note: suggested alternative: '__dt '
ninja: build stopped: subcommand failed.
So to make it work I had to skip building foundationdb, so the image will work on 3.x only, and also build it on the Power machine in docker not via multi-arch docker+qemu setup in https://github.com/apache/couchdb-ci
Merged 3.x ppc64le CI support https://github.com/apache/couchdb/pull/3659
FoundationDB doesn't build on ppc64le, but that's probably outside the scope of this ticket. If anyone is interested it maybe possible to follow along and do something similar to how aarch64 Linux port was enabled: https://github.com/apple/foundationdb/pull/2961/. At first sight doesn't seem too terribly bad - provide an equivalent rdtsc
instruction and some SSE-like assembly equivalents.
But I think we can now close this ticket as fixed.
Hi, my name Amir Sanjar leading the IBM power-enabled team. I apologize for losing access to your ppc64le worker, let me help you here. What is your HW requirement (i.e. cores, storage, public net access, Linux distro..)
Originally posted by @afsanjar in https://github.com/apache/couchdb/issues/3394#issuecomment-805275826