Open Nick-0314 opened 1 year ago
@aktech Hello, We have fully launched cirun.io and abandoned the original runner, and found some problems today. We think the current one and a half minute response of github CI is a bit slow, is there a way to speed it up? For example, do some preset software in ami and so on? In addition, we have a problem that the runner requested by CI is taken away by other CI. Is there a good solution?
We think the current one and a half minute response of github CI is a bit slow, is there a way to speed it up? For example, do some preset software in ami and so on?
Hey @mytting yes you can create custom AMI with some of the software already installed like say docker, etc. That would speed up your overall CI time. The provision time wouldn't be affected much as it's mainly just calling AWS's API spinning up a VM and installing Git Actions, installation doesn't take much time, like less than 15-20 seconds. Most of the time is spent getting a VM from AWS. I can try to take a look if there are any bottlenecks which can be improved.
In addition, we have a problem that the runner requested by CI is taken away by other CI. Is there a good solution?
What do you mean by other CI? Do you mean other jobs? Runners are are picked up by GitHub Action workflows by runner labels, which is controlled by:
runs-on: cirun-aws-amd64-32c
If you want them to be unique I can look on implementing spinning up runners by run_id, then you could do something like:
# Not implemented yet
runs-on: "cirun-aws-amd64-32c--${{ github.run_id }}"
will that help?
cirun-aws-amd64-32c--${{ github.run_id }}"
Yes, I want this effect. Is there anything you need to do?
@aktech
When will that be possible?
Yes, I want this effect. Is there anything you need to do?
Yes I need to implement it. You should have it within a few days (maximum: a week). I'll implement it at share the documentation link here.
Yes, I want this effect. Is there anything you need to do?
Yes I need to implement it. You should have it within a few days (maximum: a week). I'll implement it at share the documentation link here.
ok, wait for the good news. Does github action support this syntax?
Another restriction is that the runner label must begin with cirun, as if it is not mentioned in the documentation @aktech
Yes, I want this effect. Is there anything you need to do?
Yes I need to implement it. You should have it within a few days (maximum: a week). I'll implement it at share the documentation link here.
ok, wait for the good news. Does github action support this syntax?
Oh, I just tried it. github supports this syntax
Another restriction is that the runner label must begin with cirun, as if it is not mentioned in the documentation
Thanks for pointing that out, I'll update that in the documentation, apologies for the inconvenience. Yes, that's important because its make my life easier to filter webhook events, where runner needs to be created, otherwise it would have been tricky.
ok, wait for the good news. Does github action support this syntax? Oh, I just tried it. github supports this syntax
Yep, I tried it as well. You would hear from me soon. :)
It seems that the spot instance defining multiple regions and multiple specifications in the.cirun file does not work, and the spot request often appears open, 'no Spot capacity available', at which point cirun considers the creation successful. @aktech
Yes, that's an outstanding bug. It will be fixed in the next release.
Does cirun support google cloud? aws spot instances are billed by the hour, one hour minimum, our CI usually runs for about 10 minutes, I understand that gcp is billed by the minute, @aktech
Does cirun support google cloud?
Yes, it does.
aws spot instances are billed by the hour, one hour minimum,
Are you sure? to me it seems like you're charged for seconds used: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/billing-for-interrupted-spot-instances.html
cirun 支持谷歌云吗?
是的,它确实。
aws Spot 实例按小时计费,最少一小时,
你确定吗?对我来说,您似乎需要为使用秒数付费:https ://docs.aws.amazon.com/AWSEC2/latest/UserGuide/billing-for-interrupted-spot-instances.html
I'll check with the sales staff. I think the bill is by the hour
I'll check with the sales staff. I think the bill is by the hour
Quite strange, let me know if you hear from them.
Quite strange, let me know if you hear from them.
Just confirmed that the bill is by the second, was misled by some pages of aws
@aktech Hi ,has there been any progress recently?
Hey @mytting not yet, I’m travelling at the moment. Expect it by the end of this week.
ok
Unique runner labels is available now: https://docs.cirun.io/reference/unique-runner-labels
Recently, some open source projects came to our aws spot to ask us what to do. We recommended cirun. We will promote cirun recently. cirun has solved many of our pain points. It's great.
@aktech Hi I have raised an issue to the GitHub documentation, hoping that excellent projects like Cirun can be added to the GitHub documentation, and users will avoid detours https://github.com/github/docs/issues/21697
In addition, now the job is basically fixed for 90 seconds, and the time may be a bit long. Is there a template for user-data? We can find aws people to see if there is any way to optimize it.
Hey @mytting thanks a lot for that, I really appreciate it. There isn't a specific template for the same, but it's fairly simple, which is pulling the github actions runner software and installing it and creating a user and it doesn't take much time really, for example, here are the logs of a one of the random runners on this repo, it took about 15 seconds for the user data script to run.
My suspicious is on AWS, the time they take to hand over a VM is quite slow. Let me know if you have more questions. I am happy to jump on a call with you and AWS to see where the bottlenecks are.
ok ,I have a general understanding, is there a general script? How to pass it to EC2, I can let AWS people debug it
Hey @mytting thanks a lot for that, I really appreciate it. There isn't a specific template for the same, but it's fairly simple, which is pulling the github actions runner software and installing it and creating a user and it doesn't take much time really, for example, here are the logs of a one of the random runners on this repo, it took about 15 seconds for the user data script to run.
My suspicious is on AWS, the time they take to hand over a VM is quite slow. Let me know if you have more questions. I am happy to jump on a call with you and AWS to see where the bottlenecks are.
It seems that after the runner is registered to become the idle state, it will switch to the offline state, and then it will become the Active state
I tried it. It took about 25 seconds from creating EC2 to being able to ssh. Is there a network reason for downloading the product?
ok ,I have a general understanding, is there a general script? How to pass it to EC2, I can let AWS people debug it
I can try to create one for you.
It seems that after the runner is registered to become the idle state, it will switch to the offline state, and then it will become the Active state
Ah, interesting.
I tried it. It took about 25 seconds from creating EC2 to being able to ssh. Is there a network reason for downloading the product?
Did you create it via API? Can you share the script? If that's the case then it might be something on our end. I am happy to take a look, later this week.
Manually created... ..
I mean I created it manually and didn't pass in user-data?
I mean I created it manually and didn't pass in user-data?
Ah, ok. I'll debug ours and will let you know.
ok
title: DeepFlow Accelerates GitHub Action Exploration Using Spot Instances date: 2022/11/01 author: Song Jianchang avatar: cover: https://yunshan-guangzhou.oss-cn-beijing.aliyuncs.com/pub/pic/20221027635a6171c75b3.png excerpt:
Github Action makes the CI process of projects hosted in Github very convenient, but the Runner configuration of 2C7G provided by Github by default is too low to run some large-scale project compilation tasks. This article is an exploration of DeepFlow using public cloud high-end cheap Spot instances to accelerate Action , After a series of stepping on the pit, we finally found an ideal solution to solve all the needs of performance, cost, ARM, etc. I hope it will be useful to you.
Since the DeepFlow open source code was pushed to GitHub, we encountered the problem that GitHub Action took too long to compile tasks due to the low managed Runner configuration. Before that, the Alibaba Cloud 32C ECI Spot instance used by our internal GitLab CI took a few minutes. You can run all the jobs (the specific method will be introduced in a separate article later). After seeing the changes brought by the Spot instance to our GitLab CI, the GitHub Action of DeepFlow has been looking for a way from the first day of its launch. Each Job is assigned an independent Runner and supports the solution of the X86/ARM64 architecture, but this process is not smooth, and after 5 versions before and after, an ideal solution is finally found.
Some problems encountered by DeepFlow in the early stage of using GitHub Action:
self-hosted
Runner cannot be dynamically scaled, the jobs are often queued, the machine is idle for a long time, and it is not cost-effective to fill the annual and monthly configurationOur needs:
Based on our needs and the GitHub Action community documentation, we also found some solutions:
K8s Controller | Terraform | Github Larger Runners | Cirun | |
---|---|---|---|---|
Runner | Container | Linux, Windows | Linux, Window, Mac | Linux, Windows, Mac |
Supported Cloud Platforms | Kubernetes | AWS | - | AWS, GCP, Azure, OpenStack |
Whether to support ARM64 | Supported | Supported | Not supported | Supported |
Whether Spot is supported | Not supported | Supported | - | Supported |
Deployment Maintenance Cost | Medium | High | None | None |
The first solution we tried was K8s Controller
. After trying it out, we found the following defects:
generic-ephemeral-volumes
If there is no way to use Fargate, it is necessary to prepare independent nodes, and there is no way to dynamically scale nodes and use pay-as-you-go instances and Spot instances.
Next we tried the Terraform
solution, but also encountered some setbacks:
The GitHub
solution does not support ARM64 instances, and passes directly.
In the end we chose Cirun:
Cirun: Supports customization of arbitrary machine specifications, architectures and images. It is free for open source projects, does not require deployment and maintenance, does not require additional resources, and is very simple to operate:
Step 1: Install the App Install Cirun APP in GitHub Marketplace
Step 2: Add Repo Add the required repo in the Cirun console
Step 3: Configure AK/SK
Configure AWS ACCESS KEY
and Secret KEY
in the Cirun console
Step 4: Configure Machine Specifications Machine specification and Runner Label are defined in GitHub Repo
runners:
- name: "aws-amd64-32c"
cloud: "aws"
instance_type: "c6id.8xlarge"
machine_image: "ami-097a2df4ac947655f"
preemptible: true
labels:
- "aws-amd64-32c"
- name: "aws-arm64-32c"
cloud: "aws"
instance_type: "c6g.8xlarge"
machine_image: "ami-0a9790c5a531163ee"
preemptible: true
labels:
- "aws-arm64-32c"
Step 5: Getting Started
Toggle GitHub Job's runs-on
field
jobs:
build_agent:
name: build agent
runs-on: "cirun-aws-amd64-32c--${{ github.run_id }}"
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: recursive
fetch-depth: 0
final effect:
Currently DeepFlow uses AWS 32C64G Spot instances to run CI in parallel, with an average monthly consumption of $300. If you use the monthly subscription method, you can only run two 16C32G X86/ARM64 instances under the same consumption, and once there are parallel tasks, you need to wait in a long queue.
DeepFlow's main CIs have been switched to Cirun's Runners, meeting all previous expectations:
We also encountered some problems in the middle of using Cirun, all of which have been well supported by the author, see Issues for details:
There is also some work in progress:
Quoting the introduction of AWS official website:
- The only difference between On-Demand Instances and Spot Instances is that when EC2 needs more capacity, it interrupts Spot Instances with a two-minute notice. You can use EC2 Spot for a variety of fault-tolerant and flexible applications, such as test and development environments, stateless web servers, image rendering, video transcoding, to run analytics, machine learning, and high-performance computing (HPC) workloads. EC2 Spot also tightly integrates with other AWS products, including EMR, Auto Scaling, Elastic Container Service (ECS), CloudFormation, and more, giving you flexibility in how to launch and maintain applications running on Spot Instances.
- Spot Instances are a new way to buy and use Amazon EC2 instances. The spot price of Spot Instances changes periodically based on supply and demand. Start Spot Instances directly using a method similar to purchasing On-Demand Instances, and the price will be determined based on the supply and demand relationship (not exceeding the On-Demand Instance price); users can also set a maximum price, which will run during the period when the set maximum price is higher than the current spot price such instances. Spot Instances complement On-Demand and Reserved Instances and provide another option for obtaining compute capacity
DeepFlow is an open source, highly automated observability platform. Link, high-performance data engine. DeepFlow uses new technologies such as eBPF, WASM, and OpenTelemetry, and innovatively implements core mechanisms such as AutoTracing, AutoMetrics, AutoTagging, and SmartEncoding, helping developers to improve the automation level of embedded code insertion and reduce the O&M complexity of the observability platform. Using DeepFlow's programmability and open interface, developers can quickly integrate it into their observability stack.
GitHub address: https://github.com/deepflowys/deepflow
Visit DeepFlow Demo to experience a new era of highly automated observability.
@aktech
This is a recent article I wrote to promote cirun. Can you give me some advice? Can you also give a brief introduction to what cirun is? example Watt is Cirun
Hey @mytting
This is a recent article I wrote to promote cirun. Can you give me some advice?
That looks pretty good, thanks for writing this. It would really very useful for folks who want to try different strategies. I think it would be interesting to add some cost numbers across different strategies. Also, there is some formatting issue with your table:
Syntax should be something like this:
| Syntax | Description |
| ----------- | ----------- |
| Header | Title |
| Paragraph | Text |
Preview:
Syntax | Description |
---|---|
Header | Title |
Paragraph | Text |
Can you also give a brief introduction to what cirun is? example Watt is Cirun
A brief introduction would be something like this:
Cirun is a way for developers and teams to run their CI/CD pipelines on their secure cloud infrastructure via GitHub Actions. The project aims to provide freedom to choose cloud machines with any configuration, saves money by using low cost instances and saves time by enabling unlimited concurrency and performant machine and all of this with a simple developer friendly yaml file. It currently support all major clouds including GCP, AWS, Oracle, DigitalOcean, Azure and on-premise cloud via OpenStack. Cirun is completely free for open source projects without any restrictions.
thanks
是的,这是一个突出的错误。它将在下一个版本中修复。
@aktech Hello, is there any progress on this issue? In recent days, X86 machines also have spot requests, which often affects CI.
Hey @mytting I didn't had the chance yet. I'm hoping to work on it this weekend. Apologies for the delay.
By the way is that blog post already published, the one you mentioned above? If yes, can you share the link please?
Yes, but only in Chinese. Is it convenient for you to check and share? https://mp.weixin.qq.com/s/26qbfq7bBmmgOk_NFWVUow https://deepflow.yunshan.net/blog/015-deepflow-uses-spot-Instances-to-speed-up-github-action-exploration/
Awesome, thanks a lot! i just used google translate to view the page. Is it possible for you to post it in english as well (just the google translate) on somewhere like say dev.to? I am happy to review the translation.
I'll try it next week.
Sure, no hurries.
I wrote a twitter thread, quoting from your blog: https://twitter.com/iaktech/status/1593574852241154049
@mytting I have changed the backend to use the latest CreateFleet
API of AWS for creating spot instances:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-best-practices.html#which-spot-request-method-to-use
I did not see much difference in the runner creation time for smaller instances (like t2.medium
), let me know if you see any difference (for your larger instances). I have also done some profiling to see if there are any bottlenecks anywhere else, I haven't found any so far apart from runner creation from AWS itself.
@aktech Is it a launch parameter passed through user-data? Is it convenient to provide a general user-data script? Let me test the difference between adding user-data and not.
Syntax Description Header Title Paragraph Text
https://dev.to/dundun/deepflow-uses-spot-instances-to-speed-up-github-action-exploration-2a90
@aktech Is it a launch parameter passed through user-data? Is it convenient to provide a general user-data script? Let me test the difference between adding user-data and not.
So, the launch template is created first (with user-data) via: create_launch_template and then it is passed to create_fleet.
https://dev.to/dundun/deepflow-uses-spot-instances-to-speed-up-github-action-exploration-2a90
Excellent, thanks a lot!
@aktech Is it a launch parameter passed through user-data? Is it convenient to provide a general user-data script? Let me test the difference between adding user-data and not.
So, the launch template is created first (with user-data) via: create_launch_template and then it is passed to create_fleet.
https://dev.to/dundun/deepflow-uses-spot-instances-to-speed-up-github-action-exploration-2a90
Excellent, thanks a lot!
OK, I'm interested in what is the general content of the user-data? The vm I created in the console started up quickly.
OK, I'm interested in what is the general content of the user-data? The vm I created in the console started up quickly.
Sure, I can share that. Can you join slack here. I'll DM you, after I create one minimal example.
ok
Feature request
Shorten the waiting time of GitHub CI
Use case