WorksOnArm / equinix-metal-arm64-cluster

Arm and Equinix Metal have partnered to make powerful Neoverse based Armv8 bare metal infrastructure including latest generation Ampere systems — available for open source software developers to build, test and optimize for Arm64 architecture.
http://www.worksonarm.com
77 stars 12 forks source link

OCaml language - Works On Arm Sponsored #279

Closed avsm closed 2 years ago

avsm commented 3 years ago

If you are interested in filing a request for access to the Works on Arm test and CI infrastructure, please fill out the details below.

Proposals will be evaluated on a biweekly cycle or on a best effort basis by Arm and Equinix Metal.

This is a followup to #5 for the purpose of the Mt Jade machines.

Name, email, company, job title

Name: Anil Madhavapeddy Email: anil@recoil.org Company: University of Cambridge Job: Faculty

Project Title and description

OCaml Build Infrastructure: aarch32/64 support

OCaml is a general purpose programming language with an emphasis on expressiveness and safety. Examples of large scale systems implemented in OCaml include Facebook's Hack language, the Infer static analysis tool, the Flow JavaScript inference, the Coq proof assistant, the Compcert certified C compiler, the ReasonML toolchain, and the MirageOS operating system.

OCaml has full support for fast, native code compilation to ARM 32/64. We would like to add in regular ARM CI for both the compiler and the opam package manager. This involves around ~8000 packages being built on a matrix of 8 different compiler versions and variants, all of which are sandboxed in Docker containers.

We would also like to add testing for FreeBSD and OpenBSD on aarch64, but this depends on the respective operating systems booting on Packet.net machines.

It would also be helpful to be able to have access to machines that are capable of 32-bit armhf builds, so that we can generate binary packages for the Raspberry Pi and similar devices.

Describe your use case for these machines

End users would gain immediate feedback about their submitted OCaml packages working on aarch64 alongside the x86_64 tests. We would also get prompter feedback about aarch64 specific issues (e.g. broken backtraces) due to increased coverage and fuzz testing of the compiler itself.

Which members of the community would benefit from your work?

End users would gain immediate feedback about their submitted OCaml packages working on aarch64 alongside the x86_64 tests. We would also get prompter feedback about aarch64 specific issues (e.g. broken backtraces) due to increased coverage and fuzz testing of the compiler itself.

Is the code that you’re going to run 100% open source?

It is all 100% open source:

We are using this infrastructure for testing the OCaml aarch64 multicore backend too, so we can use ARMv8.2 features.

What infrastructure (computing resources and network access) do you need?

2x Mt Jade machines (discussed with @vielmetti)

Describe / Name the continuous integration (CI) system for this project.

https://github.com/ocluster

Will these machines be exclusively used for CI purposes?

For benchmarking, fuzzing and CI.

Please share a public URL of the CI dashboard (if applicable).

https://ci.ocamllabs.io:8080

Please state your contributions to the open source community and any other relevant initiatives.

My personal GitHub at @avsm lists most of the projects I'm involved with.

Important reminders and logistics

Approved projects will be expected to provide credit back to Works on Arm in the form of a logo display, blog post, Twitter post, news release, or some other suitable acknowledgement.

Approved projects are subject to a 90 day review process for termination.

When resources are not required anymore or when the project ends, please add comments on this issue so that we can reuse the hardware for someone else! In case a project goes through ownership change or key people leaving, please promptly inform the team by adding comments on this issue. Our team will maintain dialogue with new members.

For more project information, see the following social channels:

pgmwoa commented 3 years ago

We are in the process of getting the servers ready for you. You will get an email as soon as the hardware is reserved and ready for use.

pgmwoa commented 3 years ago

Necessary infrastructure is reserved. Please refer to the welcome email for instructions on how to use the reserved server and the supporting resources. We will be looking forward for your feedback / experience on new servers

avsm commented 3 years ago

Thanks @pgmwoa! I'm just working through getting these deployed and running now in our pool. Should have the Amperes decommissioned soon as well.

pgmwoa commented 2 years ago

Hi @avsm, We wanted to check how are you doing with respect to migration. Understand that you had issues with one server and requested Equinix a swap. It would be good if you could provide some more details behind the swap request to Equinix. We are looking forward to hear from you.

avsm commented 2 years ago

We've migrated off the Amperes and older ThunderX's now! One issue we're seeing is that we've had to keep the capacity of the Mt Jades down to 20 concurrent jobs (far lower than expected) due to seeing lots of IO timeouts. Not sure what the root cause here is -- the Ampere's had no problem at a higher capacity for build jobs. Any idea if anyone else has seen something like this?

We'll look at a few filesystem options to try and speed things up (noatime, discard) and/or a RAMdisk mount.

pgmwoa commented 2 years ago

@avsm, have few queries, request your confirmation.

The old Amperes eMags (Qty 2) and ThunderX (Qty 2) - As you have migrated off, can we reclaim these 4 machines?

Regarding the Altra machines: We had allocated two machines (Mt Jades) and I understand that one of them had issues in bring up. Is that still the issue or that is resolved? And related to your observation of IO timeouts on Altra, as of now it has not been reported by any other project that I am aware of.

pgmwoa commented 2 years ago

@avsm Please delete the older systems 2 eMags & 2 ThunderX by selecting delete / destroy option so that Equinix team can reclaim those machines.

pgmwoa commented 2 years ago

Closing the ticket as old machines have been reclaimed