argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.84k stars 3.17k forks source link

Add support for ppc64le #12449

Open valen-mascarenhas14 opened 8 months ago

valen-mascarenhas14 commented 8 months ago

Summary

Requesting the argo team create official images of argo-workflows for ppc64le?

Use Cases

Our use case involves building Argo-workflow on a Kubernetes cluster with ppc64le architecture. The Argo image serves as a crucial dependency for facilitating seamless integration and testing of Kubeflow pipelines on ppc64le

Proposal

Change build infrastructure to build ppc64le variants of argo-workflows.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

valen-mascarenhas14 commented 8 months ago

Our goal is to make workflow-controller, argocli, and argoexec images multi-arched.

We propose to add this platform: [ linux/amd64, linux/arm64, linux/ppc64le ] to the following line in the YAML file at link. for enhancing the image release pipeline to include ppc64le architecture support

terrytangyuan commented 8 months ago

The Argo image serves as a crucial dependency for facilitating seamless integration and testing of Kubeflow pipelines on ppc64le

Can you point me to where KFP relies on it?

valen-mascarenhas14 commented 8 months ago

Sure @terrytangyuan This link shows how do we setup our cluster for testing .

This link shows the steps to install Argo .

This is the yaml file that has both the dependencies (https://github.com/argoproj/argo-workflows/releases/download/v3.5.2/install.yaml)

terrytangyuan commented 8 months ago

I meant documentation around the requirement of ppc64le.

ghatwala commented 8 months ago

hey @terrytangyuan we are trying to enable ppc64le pipeline in our local prow cluster , details are here in this issue- https://github.com/GoogleCloudPlatform/oss-test-infra/issues/1972#issuecomment-1842798928

terrytangyuan commented 8 months ago

I see. If it's just as simple as adding one more platform in the workflow, then I don't see why not supporting it.

terrytangyuan commented 8 months ago

Would you like to make the change and test if it works in your fork before submitting the PR?

ghatwala commented 8 months ago

yes we could , requesting @valen-mascarenhas14 to try it once via fork and then submit PR here.

agilgur5 commented 8 months ago

I see. If it's just as simple as adding one more platform in the workflow, then I don't see why not supporting it.

IMO, I would probably reject this for the same reasons as RISC-V in #12067 (i.e. lack of usage and required maintenance and build time, same reason why Argo CD is reticent to add more) and recommend an unofficial build as I did https://github.com/argoproj/argo-workflows/pull/12067#issuecomment-1783603520

hey @terrytangyuan we are trying to enable ppc64le pipeline in our local prow cluster , details are here in this issue- GoogleCloudPlatform/oss-test-infra#1972 (comment)

I'm a little confused here, that's not in k8s test-infra, that's for GCP. Are Google and Kubeflow planning to support ppc64le officially? (in particular, GCP supporting IBM PowerPC seems odd) If so, that could shift my opinion, but that wasn't really clear to me from the issue

ghatwala commented 8 months ago

hi @agilgur5 - on below , there are multiple kubeflow components already supported on ppc64le , more details in this umbrella issue - https://github.com/kubeflow/kubeflow/issues/6684

I'm a little confused here, that's not in k8s test-infra, that's for GCP. Are Google and Kubeflow planning to support ppc64le officially? (in particular, GCP supporting IBM PowerPC seems odd) If so, that could shift my opinion, but that wasn't really clear to me from the issue

valen-mascarenhas14 commented 8 months ago

@terrytangyuan I've created a fork and trying building the ppc64le specific argo-workflow images . It successfully builds the images on ppc64le . Here's the workflow link .

I'll go ahead and raise a PR if all looks good

terrytangyuan commented 8 months ago

Let's get @agilgur5's agreement before submitting the PR.

sarabala1979 commented 8 months ago

Agree with @agilgur5

IMO, I would probably reject this for the same reasons as RISC-V in https://github.com/argoproj/argo-workflows/pull/12067 (i.e. lack of usage and required maintenance and build time, same reason why Argo CD is reticent to add more) and recommend an unofficial build as I did https://github.com/argoproj/argo-workflows/pull/12067#issuecomment-1783603520

IMO align with @agilgur5 If there is a significant number of Argo Workflow users requesting a certain architecture, we can add it to the Argo Workflow build process. If not, users can fork and build the image, then update the image path in an issue so that other users can use it.

agilgur5 commented 8 months ago

If not, users can fork and build the image, then update the image path in an issue so that other users can use it.

Yea in https://github.com/argoproj/argo-workflows/pull/12067#issuecomment-1783603520 I even suggested that we could host such unofficial builds for less common architectures in argoproj-labs as well.

Those can be automated to run on every release of Workflows. That doesn't necessarily even require a fork (per se), just a CI/GitHub Actions process

there are multiple kubeflow components already supported on ppc64le , more details in this umbrella issue - https://github.com/kubeflow/kubeflow/issues/6684

If I'm reading that issue correctly and some of the affiliations here and there, this sounds to be driven exclusively & entirely by IBM? RISC-V is more open than PowerPC (and both are RISC ISAs), so if we're going to support one, it would make sense to support both as experimental. There's still quite a bit more phases therein too and no real user surveys provided as supporting evidence. Some of the other upstream deps seem to have expressed the same concerns.

lehrig commented 6 months ago

@agilgur5 thanks for looking into this.

  1. you are right that we as IBM a driving this probably to the greatest extend as we have seen a significant demand in the market for this. IBM Power customers love using open source these days, which has created this demand. Red Hat as part of IBM is further increasing such demands. IMO this is also good for widening the argo community to a wider market, in particular by not pushing those demands to tekton as alternative. We'd like to have all options on the table, so customer can select the best option in their context.
  2. given 1., we can also commit to help maintaining this line of work.
  3. as for user surveys, the umbrella issue https://github.com/kubeflow/kubeflow/issues/6684 at least is one of the highest rates issues; also upvoted by a significant amount of non-IBM Kubeflow community members. Is this what you are looking for?
agilgur5 commented 6 months ago

IMO this is also good for widening the argo community to a wider market

I don't speak for or represent Argo (my words are my own), but to be fair, it is one of the largest CNCF projects already.

by not pushing those demands to tekton as alternative

Other projects in the ecosystem like Tekton supporting PowerPC and having user adoption of that are good arguments. For the latter, no data has been presented. The former took me a bit to find (there are no binaries in the GH releases), but Tekton does seem to make builds for PowerPC (but not RISC-V?). Although idk if all its components support PowerPC.

Related is that Kubeflow Pipelines is still on an old, unsupported version of Argo (https://github.com/kubeflow/pipelines/pull/9301, https://github.com/kubeflow/pipelines/issues/8935, https://github.com/kubeflow/pipelines/issues/8942, etc). Even if Argo were to start building for PowerPC, KFP still wouldn't be able to use those images as they'd only be for supported Argo versions. I'm unsure if the KFP Tekton fork is up-to-date on a version of Tekton that supports PowerPC.

Kubeflow also doesn't yet fully support arm64 (https://github.com/kubeflow/kubeflow/issues/2337) (which has been increasingly popular due to Apple silicon).

Is this what you are looking for?

No. I mentioned that issue myself in my previous comment. It does not have any user surveys. Upvotes are not particularly nuanced as a signal (also, there are issues in Argo with many more upvotes if that measure were to be exclusively used, this one only has 4 upvotes. the Kubeflow issue is also for all of Kubeflow, not KFP specifically either as KFP users are a subset of Kubeflow users -- I have seen many that don't use KFP).

As an example, Argo does have a roughly yearly survey that has size & scale of users mentioned

of non-IBM Kubeflow community members

As there is no survey data or similar, that statement is difficult to quantify. How many organizations are using or interested in PowerPC and KFP (or Argo in a different fashion) on PowerPC? There is no data on that and nearly all comments are from IBM.

Also there still hasn't been a counter-argument presented for why PowerPC and other less common architectures could not be hosted in an unofficial builds repo. As I wrote above, that could be hosted within argoproj-labs. IBM could also host one for IBM-led archs (including s390x)

gerrith3 commented 5 months ago

@agilgur5 one comment you made about argo having good acceptance already and that being kind of a counter to the need for adding Power support, but the whole point of a graduated CNCF project is to make it accessible to all CNCF community members for use and for contribution. And, as with any open source project, different end users or middlemen in the overall process contribute based on their own needs. In IBMs case, the feature that we contribute and support is typically support for additional architectures like ppc64le and s390x. Is the argument here that supported architectures are not a feature of the project? Communities typically evolve to serve all members and there are a lot of features that IBM might not need or endorse, but generally we wouldn't block them simply because we didn't think they were necessary or have proof that they were "widely enough" used, whatever that litmus test might be.

And, you do point to one of the challenges that IBM does have with open source communities - our end users (typically customers) are not very interactive in open source communities and thus we wind up as their proxies, which is more painful than you might expect. ;) But part of the challenge here is that we have built an ecosystem of $xxxM of product into broader ecosystems that involve billions of dollars in things like banking and finance or health care, etc. etc. Odds are the users of Power systems include every open source developer with a bank account or credit card, because that's the type of workloads that IBM Power and IBM Z can often be found in, running the world's largest and often most secure installations.

So, IBM voted, on behalf of its customers, via CNCF support for ArgoCD either directly or with Red Hat as a partner, we voted with Red Hat when we jointly created and released GitOps for Power, we vote when we have multiple developers engaging with open source communities and spend our dollars because our customers tell us that they want these capabilities.

I'm honestly not sure how we can get you the information you requested on votes - ideally someone like IDC or Gartner would have those connections into customers, but we wouldn't be out here advocating for changes like this if they weren't requested by what will be your own end users. ;)

Finally, with your argument of hosting things for Power and Z in cloned repositories, consider that we've worked with some >40,000 open source communities on Power in the past year alone. Cloning all of those projects and maintaining them with IBMers would be totally ridiculous in terms of cost and effort and the complete and total opposite of the point of open source. As you are probably aware, IBM has contributed to open source since the 90's at least, and probably even since the 60's, and in a volume proportionally larger than most other large companies throughout the years. We believe in open source, we believe in collaborations like the Linux Foundation, Apache Foundation, CNCF, etc. etc. Trying to clone all of GitHub for ppc64le and for s390x would be a massive waste of world brainpower, disk space and compute power. ;) What seems hard at the beginning but is ultimately easier is finding the right way to collaborate and embrace the promise of open source.

Sorry for the soap-boxy response, but I wanted to provide a little bit of a view from "the other side." :)

thanks for listening!

agilgur5 commented 5 months ago

Is the argument here that supported architectures are not a feature of the project?

No, the question is and has been (this issue was opened as a feature request and still is one) why should those architectures be officially supported as part of the core? Especially so when we don't have any tests on those architectures nor contributors actively running on those architectures (as another form of testing). We have either or both for the currently supported architectures. Again as was mentioned above, other projects said very similar things (e.g. https://github.com/pyca/cryptography/issues/7723)

As was already written, every feature incurs a maintenance burden, and Workflows already has some efforts ongoing to reduce that burden and move things into user-land (e.g. #6943, #12694) as well as get more active contributors (c.f. the Sustainability Effort). Every project has to make a decision about the trade-offs of and priority of a feature, and if it can be implemented easily in user-land, that substantially decreases any priority or rationale for it to be supported in the core. As I wrote there, CD is also reticent to add more architectures for a similar trade-off of lack of usage vs. required maintenance and build time.

Finally, with your argument of hosting things for Power and Z in cloned repositories

That is similarly not what was said here. A separate build repo was suggested, similar to Node.js's https://github.com/nodejs/unofficial-builds/, which is a significantly larger project than Argo.

would be a massive waste of world brainpower, disk space and compute power.

That suggestion would in fact waste less resources, compared to builds for every commit on main for infrequently used architectures. Making all builds more complex and longer for infrequently used architectures is an argument against this feature. Similar can be said about core maintenance and support for an untested build, as opposed to an explicit unofficial builds repo, which can also have separate maintainers etc.

I'm honestly not sure how we can get you the information you requested

As was already written, given that OSS projects, including CNCF projects like Argo and many others, are able to survey their users, I would think that IBM would definitely have the resources to do the same thing.

Sorry for the soap-boxy response

For context, I am a volunteer who has put thousands of hours of unpaid time into OSS communities, including Argo & CNCF (and all of that is readily available, publicly accessible information). A significant portion of OSS is run by passionate volunteers & hobbyists. And that's about all I'll say on that point, it is preferred to keep things not personal, on topic, and focused on problem-solving and concrete data. Marketing statements by corporations are also generally discouraged and CNCF requires vendor neutrality (parts of your comment are definitely pretty close to the line).

As I have done a few times now, I would ask that all questions asked and concerns raised be addressed. Multiple IBM employees have yet to do so.

lehrig commented 4 months ago

@agilgur5 - thanks again; let's summarize where we are:

  1. Is the gist & potential next actions to run a survey looking for potential users of argo on ppc64le? ("If argo was available on ppc64le, would you use it?")
  2. If so, what would be a ball park number of users we need to drive this forward?
  3. Should we (IBM) run such a survey with our customers? Or would it make sense to incorporate a question for supported architectures in the yearly Argo survey?
agilgur5 commented 4 months ago
  1. That's one next action.
    1. I imagine IBM might want to also ask about s390x, and Argo would also want to ask about RISC-V and others.
    2. No IBM folks have answered the questions around a separate build repo similar to https://github.com/nodejs/unofficial-builds/. That can be done today, no survey or extra info needed. Some of y'all could potentially even be maintainers of such a repo. I can also personally sponsor the inclusion of that repo into argoproj-labs (as well as approve docs PRs to link to it).
      1. We'd also probably add other experimental builds there, such as RISC-V arch builds and FIPS 140-2 validated crypto builds. Any experimental builds can be hosted there without too significant of a need (including s390x, for example).
      2. At this point I'm considering making that repo myself, but tbh given that contributors don't seem to be willing to do that themselves, it honestly begs even more the question of its real utility, usage, and necessity.
      3. Another prior question, how would this arch be tested?
  2. I can't make that determination alone, but a very rough back of the napkin estimate off the top of my head would be several large orgs (5-7+) or many smaller orgs (15-20+)
  3. We could do both. I'm not sure how much overlap there is with regard to survey answerers, especially potential users. Like I imagine an IBM survey would get more responses regarding ppc64le support than an Argo survey. I also have not been involved in prior year surveys (I've been a contributor for about a ~year rn).
    1. cc @caelan-io I think Pipekit was helping run / analyze / summarize the survey? (as I see you were author of last year's summary)
agilgur5 commented 1 month ago

would be a massive waste of world brainpower, disk space and compute power.

That suggestion would in fact waste less resources, compared to builds for every commit on main for infrequently used architectures. Making all builds more complex and longer for infrequently used architectures is an argument against this feature.

To put some concrete numbers to this, from a recent build on main:

So cross-compilation for arm64 already takes around an order of magnitude longer. I imagine that less used architectures may take even longer (less cross-compilation optimizations).

Those are very real numbers that do non-trivially affect the length of our existing release process (and I've waited on the arm64 builds more than once; at least every time I release a patch version I notice this)

terrytangyuan commented 1 month ago

@lehrig and others from RH/IBM - Could you reach out to me via RH/IBM Slack?

terrytangyuan commented 1 month ago

Related is that Kubeflow Pipelines is still on an old, unsupported version of Argo (https://github.com/kubeflow/pipelines/pull/9301, https://github.com/kubeflow/pipelines/issues/8935, https://github.com/kubeflow/pipelines/issues/8942, etc). Even if Argo were to start building for PowerPC, KFP still wouldn't be able to use those images as they'd only be for supported Argo versions.

Quick note - this has been fixed and KFP upgraded to 3.4+ https://github.com/kubeflow/pipelines/pull/10568.