MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

Upsert envelope takes ~250 bytes to store 2 8-byte strings #9369

Open philip-stoev opened 2 years ago

philip-stoev commented 2 years ago

What version of Materialize are you using?

From the shell:

v0.10.1-dev (470aa9430)

What was the issue?

Ingesting 10M records with the format:

{"key": "NNNNNNNNNN"} {"f1": "NNNNNNNNNN"}

Where NNNNNNNNNN is a 8-character string causes 2.5Gb of memory to be consumed in the upsert operator alone. This equates to 250 bytes/record, for a 10x amplification over the raw data just in this operator alone.

The flamegraph looks like this: Screenshot from 2021-12-01 12-07-23

This ticket, combined with #7428 essentially means that an 8Gb machine can not execute a DISTINCT aggregate against 10M unique upsert records.

Is the issue reproducible? If so, please provide reproduction instructions.

$ set kafka-records-envelope-upsert-distinct-key={"type": "record", "name": "Key", "fields": [ {"name": "key", "type": "string"} ] }
$ set kafka-records-envelope-upsert-distinct-value={"type" : "record", "name" : "test", "fields" : [ {"name":"f1", "type":"string"} ] }
$ kafka-create-topic topic=kafka-records-envelope-upsert-distinct
$ kafka-ingest format=avro topic=kafka-records-envelope-upsert-distinct key-format=avro key-schema=${kafka-records-envelope-upsert-distinct-key} schema=${kafka-records-envelope-upsert-distinct-value} publish=true repeat=10000000
{"key": "${kafka-ingest.iteration}"} {"f1": "${kafka-ingest.iteration}"}
> CREATE MATERIALIZED SOURCE kafka_records_envelope_upsert_distinct
              FROM KAFKA BROKER '${testdrive.kafka-addr}' TOPIC 'testdrive-kafka-records-envelope-upsert-distinct-${testdrive.seed}'
              FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY '${testdrive.schema-registry-url}'
              ENVELOPE UPSERT;
philip-stoev commented 2 years ago

From @ruchirK :

Ruchir Khaitan hace 14 horas Alright — i think given that I’m pretty sure that this is all expected. IIRC, the minimum size of a row is 32 bytes, so even though we can represent the key in ~9 bytes, it will cost us 32. Same with the value (which is really key + value and only takes up ~18 bytes) it costs us 32. Then we store two copies of all of this, once in the hashtable in the upsert operator and once in the arrangement (because this is a materialized source) So, at a baseline, we expect 1.28 GB of memory usage at an absolute minimum for this test (10 million records 64 bytes for key and value per record 2 places we duplicate this storage) (editado)

Ruchir Khaitan hace 14 horas On my laptop I was seeing memory go up to ~2.5 GB. I think the hashtable is sized up by roughly a factor of 1.5 (based on the capacity being 3.6 million when there were 2.5 million elements in the hashtable), so it seems like the hashtable is taking up close to a gigabyte or so of memory

From @frankmcsherry

Frank McSherry 13:55 Just a reality check on the HashMap: each entry is going to have at least 10 + 10 + 24 + 24 + 8 = 76 bytes (string payloads, string objects, hash), The (24 + 24 + 8) may get rounded up to 24 + 48 (traditionally, the key and hash are stored together, and so .. alignment), and are then multiplied by 1.1x and then rounded up to a power of two.

While the situation is by now well understood, I am leaving this ticket open because it essentially puts an upper bound on the number of unique upsert records one can ingest in Mz for memories of reasonable size. Maybe we should make a note of this somewhere so that people are informed.