NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
257 stars 51 forks source link

CI build check is slow #3012

Open naoyam opened 2 weeks ago

naoyam commented 2 weeks ago

It takes about 1 hour to complete the clang-build CI check (example), which is not the end of the world but pretty inconvenient. I believe it used to be much faster, but I guess it's in part due to the heavy template usage like dynamic_type.

I think our options would be either running the build check on larger github runners or using our own CI resources. The former seems to require additional payments, so it's probably just easier to take the second option.

CC: @jjsjann123, @xwang233

zasdfgbnm commented 2 weeks ago

Another option is to disable benchmarks and test building on that clang-build pipeline and only build the core library.

zasdfgbnm commented 2 weeks ago

https://github.com/NVIDIA/Fuser/pull/3013

zasdfgbnm commented 2 weeks ago

Also, we can use gtest, gbenchmark, and flatbuffers installed from apt, instead of building them as a submodule.

naoyam commented 2 weeks ago

I'm not sure if we want to skip building the tests and benchmarks. I suppose using the internal CI would make it fast enough.

zasdfgbnm commented 2 weeks ago

I'm not sure if we want to skip building the tests and benchmarks. I suppose using the internal CI would make it fast enough.

Then are you suggesting that we should completely kill the clang-build pipeline? We are already using our internal CI with !build. Is there a way to automatically build every commit? Isn't there a security concern?

jjsjann123 commented 2 weeks ago

We are already using our internal CI with !build. Is there a way to automatically build every commit? Isn't there a security concern?

The last time we chatted about this, we cannot automate the CI. i.e. security issues.

So we'll still need to have !build manually trigger the CI. But only run through whatever is in BUILD/clang-build right now and have that test as protection rule for merge. For the existing CI test, we'll just add some option to opt-in for that.

xwang233 commented 2 weeks ago

Clang-build uses Meta's nightly wheel, which mostly overlaps (not 100%) with some of our internal CI checks. We may disable the public clang-build if that is annoying.

Regarding disable builds on gtest, benchmark, test, etc, I think we should first find out which files or modules took the most time in clang-build.

wujingyue commented 2 weeks ago

Also, did clang-build get parallelized sufficiently? How many cores does the machine have?

jacobhinkle commented 2 weeks ago

Could we enable the clang-build job on github CI only on main? That way merging PRs is fast and we will see a test failure if we merge a failure so we can at least fix it or revert the merge quickly.

naoyam commented 2 weeks ago

Also, did clang-build get parallelized sufficiently? How many cores does the machine have?

I assume it's parallelized automatically but it's just using a free service, so the number cores is likely pretty small.

zasdfgbnm commented 2 weeks ago

Also, did clang-build get parallelized sufficiently? How many cores does the machine have?

I see:

Using 4 jobs for compilation

in the log

csarofeen commented 1 week ago

Having a clang-build is useful. There's additional checks in it that we don't otherwise have. Is it possible to replace with our infra instead of using the public infra?

naoyam commented 1 week ago

Having a clang-build is useful. There's additional checks in it that we don't otherwise have. Is it possible to replace with our infra instead of using the public infra?

@xwang233 said that should be possible. He will give us a plan.

xwang233 commented 1 week ago

My rough idea is to remove clang-build as a required CI check for now to unblock people. This can be done now and I just removed the CI check.

We'll then set clang-build to main only with the public GitHub infra, and move the PR part to our internal infra. Once that's ready, we can re-enforce clang-build-internal as a required check. I don't have a detailed timeline for that, estimate is 1-2 weeks.