Closed shannonwells closed 2 years ago
@aramikm My understanding that we don't have production
profile set up at the moment. I only see profile.release configured, so this will not work:
cargo build --profile=production --features runtime-benchmarks
Instead, I am planning to compile the binary this way on the referenced hardware prior to running benchmarks:
cargo build --release --features runtime-benchmarks
/cc @saraswatpuneet
@wilwade Per this acceptance criteria:
As the repo, I should block merging until I must have the weights updated as the last commit if anything changes that might change the weight (rust files?)
What if we kick off the benchmarking CI job when files matching certain glob pattern get updated? A developer would still be able to trigger it manually with some predefined action (maybe by entering a slash command /update-weights
in a comment?), though. Basically, there will be these two action triggers for benchmarking:
@demisx I think we can just go with manual kickoff for now. We could likely create a glob that could do it, but then we would end up triggering it a lot of extra times.
@wilwade Understood. Will stick with the manual trigger for now.
Per @jacobmfoley:
AWS Credentials: You can either configure the cli to use SSO as described here: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html. click on the link as shown in the attached screenshot.
Region: us-east-2
EC2: if starting an instance, launch into private subnet (no public IP addresses), add an owner tag, specify what the purpose of the machine is in the name and or tags, and stop the instance/clean up after you're done.
Here is the proposed approach (high-level). Please let me know your feedback, so I can move forward.
@demisx I think in general that looks good. I think the only thing I wonder is if it is worth blocking the dev branch from additional commits instead of just watching for additional commits and stopping the runner if it finds any. I think watching is better, but it might just be easiest to get up and running by not blocking or watching, but instead just fail out if when commit time comes if the head has changed.
PR blocking can happen via a Commit status API: https://docs.github.com/en/rest/commits/statuses
@demisx One other thing. We should likely limit the commit to just LibertyDSNP org members. That way we can run them on forked PRs, but others cannot trigger it.
@wilwade Thank you for your feedback. Here are my early morning coffeeless brain thoughts:
I think in general that looks good. I think the only thing I wonder is if it is worth blocking the dev branch from additional commits instead of just watching for additional commits and stopping the runner if it finds any.
Great point. If understand correctly, you mean we can utilize the GitHub concurrency group here and have GH cancel running actions if newer commits are pushed to the same ref
.
I think watching is better, but it might just be easiest to get up and running by not blocking or watching, but instead just fail out if when commit time comes if the head has changed.
I was originally thinking of this approach, then my concern was what if a developer pushes a new commit while the action is still running (which may take a while)? Do we simply let the old commit in CI fail? What if it was a simple readme file update? The previous weights would be still valid. But, if the benchmarking dependent code was updated instead and the developer didn't include /run-benchmarks
slash command then the previously calculated weights may not be valid anymore. Seemed like too many edge cases could arise with this approach. However, if we use concurrency
group and let the running action to be abandoned when a new commit is pushed, then I believe we don't have this problem as each new commit will kick off a new run. I would definitely start with the simplest approach.
PR blocking can happen via a Commit status API: https://docs.github.com/en/rest/commits/statuses
Thank you. I'll take a note of that.
One other thing. We should likely limit the commit to just LibertyDSNP org members. That way we can run them on forked PRs, but others cannot trigger it.
Do you mean limit running benchmarks to the LibertyDSNP org members only? In other words, parse the commit and ignore /run-benchmarks
slash command in it unless this commit comes from one of the members.
Do you mean limit running benchmarks to the LibertyDSNP org members only? In other words, parse the commit and ignore /run-benchmarks slash command in it unless this commit comes from one of the members.
Correct.
Great point. If understand correctly, you mean we can utilize the GitHub concurrency group here and have GH cancel running actions if newer commits are pushed to the same ref.
I wasn't sure if this was going to get kicked off as a GitHub action, but it it does, then that would work.
Updated diagram showing the excluded steps based on the conversation above.
If I understood correctly, the benchmarks need to be run against a different template depending on whether the pallet is external or not. Here is how I am planning to run benchmarks on referenced hardware in GitHub Actions:
For internal pallets (e.g. pallet_messages, pallet_msa, pallet_schemas):
./target/release/frequency benchmark pallet \
--chain mainnet \
--execution=wasm \
--wasm-execution=compiled \
--pallet <pallet_name> \
--extrinsic "*" \
--steps 20 \
--repeat 5 \
--output ./pallets/msa/src/weights.rs \
--template=./.maintain/frame-weight-template.hbs # <- frame template
For external pallets (e.g. orml_vesting, pallet_scheduler, pallet_utility):
./target/release/frequency benchmark pallet \
--chain mainnet \
--execution=wasm \
--wasm-execution=compiled \
--pallet <pallet_name> \
--extrinsic "*" \
--steps 20 \
--repeat 5 \
--output ./runtime/frequency/src/weights \
--template=./.maintain/runtime-weight-template.hbs # <- runtime template
Please let me know if I missed anything.
@wilwade Just FYI, @shannonwells and I realized that we maybe overlapping here and in #279, because Shannon has already created EC2 launch templates (x86 and ARM), plus the script that runs benchmarks for all pallets in #279 and I was asking questions here on proper commands to run the benchmarks on referenced hardware. We decided that I will focus on the GitHub Actions process/AWS (devOps stuff) here and then we'll plug in Shannon's work in #279 to run benchmarks as part of the GH Actions job. This way we should not duplicate any work.
I also added a note to the description re: templates and script coming from #279
@jacobmfoley What’s the process to add AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
key to GitHub secrets? Do I need to request it through Pivotal? And if GitHub Actions needs to temporarily spin up an EC2 instance, should it do it in the current Dev/Stg account or do we have a CI dedicated AWS account? Thank you.
@demisx I've avoided adding AWS credentials to Github Actions and have the goal of eliminating all static IAM users all together.
If using actions is a requirement, we can run self hosted actions runners on the required instance sizes.
If not, there is a spot fleet of Jenkins slaves with rust installed with the label "benchmark" ready to run jobs.
@jacobmfoley Thank you for your prompt reply. We already use GitHub actions in this Frequency project and I was planning to keep it simple without introducing Jenkins to the stack. The idea was to launch an EC2 instance on demand when a developer wants to run benchmarks and then destroy it at the end of the actions run. This would let us:
Please let me know if it's still OK to proceed with the GitHub Actions and whether it's OK to start the instance on the fly or we should use the self-hosted runner per your suggestion. I'll update the diagram accordingly.
UPDATE: Something to consider if going with self-hosted runners:
@demisx Reference hardware is confirmed: #279
@wilwade Got it. Waiting on @jacobmfoley to finalize the approach described in previous comment before moving forward with this. Working on #256 in the meantime.
Per Slack conversation with @wilwade @sbendar @jacobmfoley, there are security concerns using AWS credentials in GitHub Actions as a bad actor can potentially gain access to them. Instead, devOps recommendation is to leverage Jenkins CI as it has much better protections against such incidents.
Next steps for @demisx:
lt-06cd50a2dd2a023de
After these steps are complete the story is expected to be taken over by the devOps team.
The updated flow diagram with Jenkins CI is attached. Please let me know if I missed anything.
@jacobmfoley I have provided the outstanding items above. Please let me know if you need anything else. Thank you.
Per @jacobmfoley Slack message, devOps have all info they need at this point to move forward and this will be their next story to work on after the collator automation is finished.
Followed up in devops Slack channel on ETA of the story completion (see quoted below). Awaiting response.
Re: CI Run Benchmarks on Reference Hardware (ASAP). is there a rough ETA when it will be completed? It’s kind of urgent. My understanding the only part left is committing generated benchmark files back to the repo. Thanks.
Per devOps, the updated benchmarks are supposed to be getting committed now. However, it didn't work with @wilwade's test. Also, asked the devOps to update the comment trigger to be the original slash command /run-benchmarks
.
Benchmarks are working via manual trigger with Jenkins.
Looking into self-hosted runners for it eventually: https://github.com/LibertyDSNP/frequency/issues/609
Requires @jacobmfoley assistance for infrastructure setup
Once reference hardware is decided we need to run our benchmarks on that hardware and update all the weights files.
Partially Blocked by:
279
352
Expected
--chain mainnet
spec per this commentAcceptance Criteria
make benchmarks
in a PR commentTasks
ObtainAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
from @jacobmfoley to access AWS from GitHubBlocked by:
377
Helpful Links