delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[Spark] use map lookup in createPhysicalSchema #3236

Closed andrewxue-db closed 2 days ago

andrewxue-db commented 2 weeks ago

Which Delta project/connector is this regarding?

Description

Instead of calling SchemaUtils.findNestedFieldIgnoreCase for each column, we prepare a map with SchemaUtils.explode before, and perform map lookups during iteration.

This speeds up this function on wide tables. It may still be slow for tables with deeply nested schemas because the path needs to be built every time, but there should be no regression.

How was this patch tested?

Manual profiling for an alter table add columns query:

Before: (~13s)

Screenshot 2024-06-06 at 5 50 01 PM

After: (~3s)

Screenshot 2024-06-06 at 5 50 17 PM

Does this PR introduce any user-facing changes?