coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

Pandas `take` blocking the GIL #1404

Open fjetter opened 7 months ago

fjetter commented 7 months ago

I noticed that tpch query 1 is spending only about half it's time in parquet IO when using the dataset that's been produced by pyarrow

image

However, the run is heavily GIL congested and another GIL+native profile reveals that actually very few things (in python) are holding on to the GIL (native-only threads, e.g. of pyarrow are not tracked by py-spy so we don't see how arrow holds the gil to produce the dataframe)

image

which points to the take pandas function. There's already been a recent fix to this code area (see https://github.com/pandas-dev/pandas/pull/54483) for axis0

fjetter commented 7 months ago

https://github.com/pandas-dev/pandas/pull/57454

now we just need a release

fjetter commented 7 months ago

This has been backported to 2.2.1

FWIW I'm seeing to a much lesser extend take_2d_axis1_object_object calls being invoked and blocking the GIL despite the dataframe not having any object columns. From what I can tell the columns this is referring to are uint64.

It looks like higher bit uints are not handled in https://github.com/phofl/pandas/blob/a1ce9a64384fbce80ecac443c97bf33982c5b7c7/pandas/_libs/algos_take_helper.pxi.in#L15-L35 and the 8bit uint seems to have a duplicate entry here https://github.com/phofl/pandas/blob/a1ce9a64384fbce80ecac443c97bf33982c5b7c7/pandas/_libs/algos_take_helper.pxi.in#L16-L17

I suspect uint is falling back to obj?

phofl commented 7 months ago

Do you have a query where this happens?

I'll take a look later today

fjetter commented 7 months ago

This is query 2. This is the partition_index column, see also https://github.com/dask/distributed/issues/8530

the partition index column is currently a unit64 so casting this to a smaller value is appropriate. If pandas is not implementing a fastpath for uint we can also use a signed integer for now

phofl commented 7 months ago

I don't actually know the history behind not adding uint here (maybe wheel size, but not sure), so using a singed integer should be the fastest workaround for now

phofl commented 7 months ago

The uint thing is on purpose btw, we treat bool as uint8, so that's probably the reason that this was added