conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
17 stars 43 forks source link

Enable linux ppc64le #255

Open jeongseok-meta opened 1 month ago

jeongseok-meta commented 1 month ago

Checklist

conda-forge-webservices[bot] commented 1 month ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

conda-forge-webservices[bot] commented 1 month ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

jakirkham commented 1 month ago

Thanks Jeongseok! 🙏

This looks like a good set of changes

For the remaining CI jobs, think you will need to accept the ToS, which is pretty similar to other CIs ToS. This amounts to making a PR like this one adding your username to the list like so: https://github.com/Quansight/open-gpu-server/pull/37

Once that is done we can see how CI progresses

jeongseok-meta commented 1 month ago

Done! https://github.com/Quansight/open-gpu-server/pull/39

jakirkham commented 1 month ago

Great, thank you! 🙏

Asked Jaime to take a look as he typically reviews those 🙂

Am wondering if it is worth grabbing this workaround for NumPy builds in PyTorch: https://github.com/conda-forge/pytorch-cpu-feedstock/issues/254#issuecomment-2319195557

The reason being we discovered ARM builds likely need that workaround. So think if that is true, it is probably also true for Power

What do you think?

Should add it is ok if you prefer to wait and see how CI goes first. We do now cover that in the test suite after having seen it before. So we will know either way if it is an issue

https://github.com/conda-forge/pytorch-cpu-feedstock/blob/6dd85b3f85a72371b2d3ccf6a386de67e61d667e/recipe/meta.yaml#L332

hmaarrfk commented 1 month ago

I suggest adding back the early failure I am suggesting in: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/256/commits/292cd8ee3f16bb337ec74b73a213d3ea39a22efe

you might be faster at debuging what is going on, but that early test might be more than adequate

jakirkham commented 1 month ago

Agree that sounds like a good plan. Thanks for putting it together Mark! 🙏

Commented over there 🙂

jeongseok-meta commented 2 weeks ago

@conda-forge-admin, please rerender

jeongseok-meta commented 23 hours ago

@conda-forge-admin, please restart ci

hmaarrfk commented 20 hours ago

please add:

skip: true  # [py!=310]

to the top level build section

if you are not ready yet to get the GPU enabled, please add an other line with:

skip: true   # [cuda_compiler_version!=None]

and rerender

hmaarrfk commented 16 hours ago

@conda-forge-admin please rerender

hmaarrfk commented 16 hours ago

sorry about that, lets just skip everything except for ppc64le for now and then re-optimize the builds.

jeongseok-meta commented 16 hours ago

Sure, feel free to do anything that makes sense to you or take over this PR. Thank you for helping!

hmaarrfk commented 16 hours ago

no i'm struggling with aarch64 already. i won't be able to take over, but this feedstock just hogs the CIs, then you end up building linux64 for 6 hours instead of running ppc64le which you want!

hmaarrfk commented 16 hours ago

maybe try rerendering locally, rerendering with cuda is always slow...

jeongseok-meta commented 16 hours ago

@conda-forge-admin please rerender

hmaarrfk commented 15 hours ago

@conda-forge-admin please rerender

One more rerender. I am allowed to the GPU runners, but you can simply use azure until the 6 hour timeout is hit.

hmaarrfk commented 15 hours ago

@conda-forge-admin please rerender

sigh... always one more thing to rerender

hmaarrfk commented 15 hours ago

alright, you have the CIs now. happy tuning!

hmaarrfk commented 15 hours ago

i'm also assuming that you have access to a powerful linux machine with docker.

To build locally

python build-locally.py

and it should walk you through things.

hmaarrfk commented 15 hours ago

While the builds have started, I have a feeling you are hitting the same problems I am with aarch image

See: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/256

Depending on what you are trying to prove at this stage (numpy detection vs others), you might want to pull in my patch that causes this to fail loudly.