Closed leofang closed 2 years ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
@conda-forge-admin, please rerender
Its not clear to me how many arches you attempted to build here?
The patch is too large to skim on github. Can you summarize? Did you still try to make 1 feedstock for all cuda arches?
Hi @hmaarrfk I generated and took the patch from https://github.com/cupy/cupy/pull/6941, so maybe it's easier that you refer to the summary there. Note that I haven't even split the CUDA archs yet (and CuPy still supports archs since cc35), as that would need additional work, specifically each template function needs to have CC as a template parameter, so that you can get the function pointer to the correct template specialization. It's very tedious work.
But regardless if CUDA arch is split-compiled or not, the lesson is the same: By splitting, we don't give the compiler the chance to reuse done optimizations, and we basically redo everything from scratch for each TU.
NVCC in fact has started working on compile-time reduction, see, e.g.
so this is another reason why such manual splitting should better be avoided.
ok thank you for the pointer. I'll read the references you provided.
My idea is more (and likely that nvidia has already thought about this and wrote it off)
Even if it increases the total build time.
total 55 hours >> 8 hours (estimated)
it is something that is "possible" given our infrastructure.
I think the fundamental problem with conda-forge's infrastructure is that we have 2 threads. So even if you try to do things "concurrently" you are limited to "2". I thought cmake and ninja already try to run things in 2 parallel processes when they detect they can.
That's right. My feeling is the same. Whatever I did was eventually limited by the CI env.
Right, so i think my question in the other thread (and I'm happy to move the conversation there again) is:
Not without significant code refactoring & manual stitching (see how I split it in https://github.com/cupy/cupy/pull/6941 to generate hundreds of tiny TUs, the diff is really a mess), and in the end I really don't know if it'd work, without someone working out a prototype solution first. The project (not package) maintainers must be on board for such messy changes, and in CuPy's case I couldn't even convince myself it's worth, not to mention the team 🙂
In your case it's worse I'd say, because
2. Then combine them all in a meta package?
I have no idea how this can be done. Static linking, maybe?
alright. i thought maybe I was missing an obvious solution .
thanks for explaining.
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)