Excessive memory consumption on sorting

samuelcolvin commented 6 months ago

Describe the bug

I'm running the following query:

select span_name from records order by bit_length(attributes) desc limit 20

And it's running out of memory with 20GB memory limit (RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)), and passing with 30GB allowed.

Error message is:

Failed to allocate additional 25887088 bytes for ExternalSorterMerge[1] with 585120448 bytes already allocated - maximum available is 23605759

The point is that in theory this query only needs to hold the span_names of the 20 records with the longest attributes in memory.

But even if it chose to hold all span_name in memory, it shouldn't need this much memory:

there's "only" 12_980_628 rows
with sum(bit_length(span_name)) = 1_038_805_400 aka ~1GB, for all rows

To Reproduce

The dataset and code aren't public, but It shouldn't be too hard to reproduce with a table containing 2 text columns

Expected behavior

Ideally a query like this would have a far more modest memory foot print.

Additional context

Using datafusion v38.0.0, same error with mimalloc and without.

For comparison, duckdb runs this query fine with a 1GB memory limit, but fails with 500MB.

alamb commented 6 months ago

@samuelcolvin can you share the query plan for this query? Specifically what is the output of this query?

explain select span_name from records order by bit_length(attributes) desc limit 20

I would also not expect it to consume 20GB of memory

samuelcolvin commented 6 months ago

Sorry for the delay, here we go:

`logical_plan`

 Projection: records_store.span_name
  Limit: skip=0, fetch=20
    Sort: bit_length(records_store.attributes) DESC NULLS FIRST, fetch=20
      TableScan: records_store projection=[span_name, attributes]

`physical_plan`

 ProjectionExec: expr=[span_name@0 as span_name]
  GlobalLimitExec: skip=0, fetch=20
    SortPreservingMergeExec: [bit_length(attributes@1) DESC], fetch=20
      SortExec: TopK(fetch=20), expr=[bit_length(attributes@1) DESC], preserve_partitioning=[true]
        ParquetExec: file_groups={14 groups: [[Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_2139d404-1e6c-4903-90f1-775727315226.parquet:0..4619991, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_41744ee3-e741-43cf-ab17-6b986d785a58.parquet:0..4631041, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_91993b2d-9555-4d85-8dd9-71ad352a7066.parquet:0..16945773, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_f611b930-13d2-4311-acf2-489585ca7e2a.parquet:0..16920185, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_402_0001_3f208d74-4b67-44ec-aa4a-1ed30d881b7f.parquet:0..18130069, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_403_0001_780473ad-7867-4bcf-b63d-a52460f466cf.parquet:16513148..17675405, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_404_0001_d11bf279-599e-4e2c-9a60-14d13045c8dd.parquet:0..16597007, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_405_0001_456e9ee4-35d7-4483-813e-73c29d8023d2.parquet:0..15207780, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_406_0001_5becfa29-66d8-44ff-b61c-b82987f3d060.parquet:0..16885514, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_407_0001_1a81e83c-a226-4f29-b00b-54f849eb46cd.parquet:0..15583994, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_408_0001_ec8d95f7-1464-4adf-a975-6857937592e1.parquet:12323655..15398417, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_409_0001_f27ecf09-e64b-4fc3-9cb9-cdd53b2200a6.parquet:0..15601539, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_410_0001_bd93a8e8-dd6f-48ae-8622-fd439f9d27f9.parquet:0..15515518, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_411_0001_5ec01d68-8a3a-4d21-b0df-3d507bea2d1c.parquet:0..15347993, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_412_0001_3611c025-d788-4d26-b77d-424e421fcd98.parquet:0..16151170, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_413_0001_17decbd1-13d0-4a01-9213-8fee2ba46b7b.parquet:12069225..16928575, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_414_0001_d7924295-f735-4b34-af7f-3975ad3f91b6.parquet:0..16952022, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_415_0001_7a7e0d3e-de3e-462b-9fb5-264277a2c74b.parquet:0..16826232, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_416_0001_da93e867-a912-454a-a758-c9e27dcf74df.parquet:0..17230023, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_417_0001_577d3fa8-ea7d-471f-a33a-0d65faa54902.parquet:0..17533201, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_418_0001_b5f013f2-d3ed-4f98-bb9c-cb58b059b5fc.parquet:4359379..17560075, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_419_0001_22e4dd59-1639-4c86-980b-12745c86aef6.parquet:0..17509164, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_420_0001_24661f8e-bef4-4efd-84fb-55f531e8a495.parquet:0..18239445, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_421_0001_75401751-5740-4086-a2ec-0e9c23bdc9a7.parquet:0..18045516, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_422_0001_2b55263e-f63b-402c-8024-9e85fc8c3d03.parquet:0..10765386], ...]}, projection=[span_name, attributes]

alamb commented 6 months ago

🤔 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches

20 is the k
14 is the number of file groups (aka partitions)

🤔

blaginin commented 2 weeks ago

Interestingly, the same data in CSV works fine, but Parquet causes a crash

blaginin commented 2 weeks ago

take

blaginin commented 1 week ago

After #13377 was merged, this is now much more memory-efficient and, in fact, works on just 1 GB for me 🎉 (Although I had to change bit_length to another function because we switched to string view when reading, but this type is not yet supported by bit_length)

I think we can close this issue

jayzhan211 commented 1 week ago

Although I had to change bit_length to another function because we switched to string view when reading, but this type is not yet supported by bit_length

Is it possible this memory issue is related to bit_length only? Maybe we can support string view for bit_length then test it again

alamb commented 1 week ago

Is it possible this memory issue is related to bit_length only? Maybe we can support string view for bit_length then test it again

I think this was recently merged

https://github.com/apache/datafusion/pull/13332

blaginin commented 14 hours ago

Just tested, everything works well with bit_length too

alamb commented 6 hours ago

Nice!

So shall we close this ticket? Or is there still more work to do?

blaginin commented 6 hours ago

IMO we can close it if @samuelcolvin doesn't mind 🙂

samuelcolvin commented 4 hours ago

Closing, can always create a new issue if someone reproduces.

samuelcolvin commented 4 hours ago

Thank you so much for working on this.

apache / datafusion