explosion / cython-blis

💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
Other
219 stars 37 forks source link

python core dump on Power8 #96

Open renatopancheri opened 1 year ago

renatopancheri commented 1 year ago

Hi,

on my ppc64le Power8 machine (POWER8NVL) when python library thinc == 8.1.2 calls the

cpu_gather_add(saxpy(cblas), <float *>output.data, &table[0, 0], &indices[0, 0]

function python crashes right away, i've found the problem is actually in the compile flags on Blis library, running:

sed -i -e 's/mcpu=power9/mcpu=native/g' blis/_src/make/linux-power9.jsonl
sed -i -e 's/mtune=power9/mtune=native/g' blis/_src/make/linux-power9.jsonl

fixes the problem for me. For some reason import blis worked without any problems

shadeMe commented 1 year ago

Thanks for the bug report! We'll look into this. Just to be certain, you're using blis==0.7.9/0.7.10, right?

renatopancheri commented 1 year ago

it seams to me this problem exist with both 0.7.9/0.7.10 and 0.9.X, with 0.9.X i also noticed there is also both '-O2' and '-O3' which does not seem totally intentional to me. Personally i would integrate default configure script from blis

adrianeboyd commented 1 year ago

Our ability to support/debug this is going be pretty limited, but let's see...

What are the exact commands you used to build the blis wheels?

From what I understand, the BLIS power9 target is not expected to support POWER8.

If you follow these directions:

https://github.com/explosion/cython-blis/#d-build-specific-support

you can build blis for any supported target from upstream BLIS, which for blis v0.7.x would be the ones listed here:

https://github.com/flame/blis/blob/0.7.0/config_registry

As a last resort, you can try using generic. generic will be very slow, but it's at least useful for initial testing.

adrianeboyd commented 1 year ago

Ah, I checked the setup again and I see how you ended up with power9 even though it's not the right choice, since we have an overly broad auto-detection step:

https://github.com/explosion/cython-blis/blob/be9a5a1ac49f7ef2dc2f088cb39e2fe75789dc01/setup.py#L153-L154

If you know the right way to detect power9+ with platform, a PR would be welcome!


As far as I can tell, generic may be your only choice, although again a note that this will be too slow for production use. This is how to build it for generic:

https://github.com/explosion/cython-blis#c-install-with-generic-arch-support

It looks like a future version of BLIS should include a power target: https://github.com/flame/blis/pull/718 . (But note that we've had multiple reports of segfaults with BLIS 0.9.0 that we haven't been able to track down, this PR has not been published in a BLIS release yet, and at this point we're not sure what the future of cython-blis is going to look like.)

If the OpenBLAS support in numpy is better for power8 and your goal is to use thinc or spacy, you could also potentially go back to earlier versions that lets you switch entirely from blis to numpy gemm using a flag in thinc. I think you could use spacy v3.3 or earlier using thinc v8.0 and manually set the flag use_blis=False for NumpyOps using the thinc utils to set/configure ops: https://thinc.ai/docs/api-backends#util