Open sandeepgupta12 opened 3 months ago
Thanks for reviving this. Since we moved away from Travis we stopped testing with little endian. I remember @kiszk discussing about using osuosl for s390x here: https://github.com/apache/arrow/pull/35374#issuecomment-1541882889 I am concerned about the security implications on managing those boxes. Is this done by OSL? Are the VMs ephemeral or are they long living? Do we have to ask ASF infra (@assignUser?)
Yes, I talkwd about OSL. But, I recently changed my idea to use GHA self-hosted runner since I saw this article. https://community.ibm.com/community/user/powerdeveloper/blogs/gerrit-huizenga/2024/03/06/github-actions-runner-for-ibm-power-and-linuxone
I agree with @raulcd, we can not support any non-ephemeral VM runners due to security reasons, they are a much to big risk in a public repo. This has been used to compromise major open-source repos before: https://www.legitsecurity.com/blog/github-pytorch-and-more-organizations-found-vulnerable-to-self-hosted-runner-attacks
I'd be happy to add power runners if they are ephemeral (-> vm get's destroyed after each job) which we currently have for arm runners using k8s: https://github.com/voltrondata-labs/gha-controller-infra
@raulcd @assignUser Thank you for sharing useful information.
As far as I know, this self-hosted runner framework for ppc64le and s390x uses ephemeral VM.
@kiszk No I don't think it is, the ephemeral there is talking about the image and how it needs to be build with the runner token to work, at least that's how I read it.
As the line where it starts the runner doesn't have any mechanism to kill the container and start a new one for a new job (as would be required for ephemeral runners). Which is what the controller is for, it starts a new container/runner for each job and removes the old one.
@assignUser Hi! If ephemerality is the concern then we can set the config parameters to launch ephemeral LXD containers, that wouldn't be an issue. You would still need to follow the instructions in https://github.com/anup-kodlekere/gaplib, the only thing that changes is how the containers are deployed and managed. However, we haven't tested use-case before and would need to run some tests to ensure functional correctness. A simple systemd service running a python/bash script will act like a controller in this case, which will launch a clean LXD build environment (within the same VM host) for each new job.
@anup-kodlekere Thanks, great to hear that. If changes in the instruction are prepared, I could try it for the arrow for s390x as a test.
By the way, please add the Continuous Integration label to CI-related tasks, so that we can find them using a search :-)
A couple of issues have been identified lately around Big Endian architectures, those probably would have been found if we tested on s390x:
Describe the enhancement requested
Description: We need to extend support for apache/arrow to the POWER/PPC64LE architecture.
Background: • We have forked the apache/arrow repository and have successfully generated and tested wheels for both C++ and Python using a self-hosted CI runner on an OSU PPC64LE machine. • The changes in the forked repository include following changes:
• We would like to upstream these changes to enable CI for ppc64le arch using GHA self-hosted runner.
Fork Information: • Forked Repository: https://github.com/sandeepgupta12/arrow
Request: • Support for PPC64LE: We are seeking support for the PPC64LE architecture for the apache/arrow project. • Creation of OSU VM: To facilitate further testing and CI integration, we request the creation of an OSU VM configured for PPC64LE. Below are the details where you can create the OSU VM- URL- https://osuosl.org/services/powerdev/request_hosting/ IBM Advocate- gerrit@us.ibm.com
Details: The Open Source Lab (OSL) at Oregon State University (OSU), in partnership with IBM, provides access to IBM Power processor-based servers for developing and testing open source projects. The OSL offers following clusters: OpenStack (non-GPU) Cluster: • Architecture: Power little endian (LE) instances • Virtualization: Kernel-based virtual machine (KVM) • Access: Via Secure Shell (SSH) and/or through OpenStack's API and GUI interface • Capabilities: Ideal for functional development and continuous integration (CI) work. It supports a managed Jenkins service hosted on the cluster or as a node incorporated into an external CI/CD pipeline.
Additional Information: • We are prepared to provide any further details or assistance needed to support the PPC64LE architecture. Please let us know if there are any specific requirements or steps needed to move forward with this request.
Component(s)
C++, Python