Open jkosh44 opened 1 month ago
Here are the logs from the staging env that took the longest to startup at 38 seconds (old format): staging_logs.txt
The reason each log appears twice is because the first set is from read-only mode while the second set is from read-write mode.
The prod sandbox started up in 6 seconds, so probably not as interesting as the attached staging logs.
The following hacky python snippet will sort the logs by time (you may have to play around with the sep
value and the arg to main
):
The following hacky python snippet will sort the logs by time (you may have to play around with the `sep` value and the arg to `main`):
```Python
import re
from typing import Optional
from dateutil.parser import parse
order = [
r'startup: envd init: beginning',
r'startup: envd init: preamble beginning',
r'startup: envd init: preamble complete in',
r'startup: envd init: serving beginning',
r'startup: envd serve: beginning',
r'startup: envd serve: preamble beginning',
r'startup: envd serve: preamble complete in',
r'startup: envd serve: catalog init beginning',
r'startup: envd serve: catalog init complete in',
r'startup: envd serve: system parameter sync beginning',
r'startup: envd serve: system parameter sync complete in',
r'startup: envd serve: preflight checks beginning',
r'startup: envd serve: preflight checks complete in',
r'startup: envd serve: durable catalog open beginning',
r'startup: envd serve: durable catalog open complete in',
r'startup: envd serve: coordinator init beginning',
r'startup: coordinator init: beginning',
r'startup: coordinator init: preamble beginning',
r'startup: coordinator init: preamble complete in',
r'startup: coordinator init: timestamp oracle init beginning',
r'startup: coordinator init: timestamp oracle init complete in',
r'startup: coordinator init: catalog open beginning',
r'startup: coordinator init: catalog open complete in',
r'startup: coordinator init: coordinator thread start beginning',
r'startup: controller init: beginning',
r'startup: controller init: complete in',
r'startup: coordinator init: bootstrap beginning',
r'startup: coordinator init: bootstrap: preamble beginning',
r'startup: coordinator init: bootstrap: preamble complete in',
r'startup: coordinator init: bootstrap: storage collections init beginning',
r'startup: coordinator init: bootstrap: storage collections init complete in',
r'startup: coordinator init: bootstrap: optimize dataflow plans beginning',
r'startup: coordinator init: bootstrap: optimize dataflow plans complete in',
r'startup: coordinator init: bootstrap: dataflow as-of bootstrapping beginning',
r'startup: coordinator init: bootstrap: dataflow as-of bootstrapping complete in',
r'startup: coordinator init: bootstrap: postamble beginning',
r'startup: coordinator init: bootstrap: postamble complete in',
r'startup: coordinator init: bootstrap: generate builtin updates beginning',
r'startup: coordinator init: bootstrap: generate builtin updates complete in',
r'startup: coordinator init: bootstrap: generate secret cleanup beginning',
r'startup: coordinator init: bootstrap: generate secret cleanup complete in',
r'startup: coordinator init: bootstrap: concurrently update builtin tables and cleanup secrets beginning',
r'startup: coordinator init: bootstrap: concurrently update builtin tables and cleanup secrets complete in',
r'startup: coordinator init: bootstrap complete in',
r'startup: coord serve: storage usage prune beginning',
r'startup: coord serve: storage usage prune complete in',
r'startup: coordinator init: coordinator thread start complete in',
r'startup: coordinator init: complete in',
r'startup: envd serve: coordinator init complete in',
r'startup: envd serve: postamble beginning',
r'startup: envd serve: postamble complete in',
r'startup: envd serve: complete in',
r'startup: envd init: serving complete in',
r'startup: envd init: complete in',
]
def main(filename: str):
logs = sort_logs(filename)
logs = '\n'.join(logs)
with open(filename, "w") as file:
file.write(logs)
def sort_logs(filename: str):
with open(filename, "r") as file:
lines = [line.strip() for line in file if re.match(r'\d+-\d+-\d+ \d+:\d+:\d+.\d+', line) and log_idx(line) is not None]
lines.sort(key=sort_log)
return lines
def sort_log(line: str):
sep = "\t: "
split_line = line.split(sep, 1)
d = parse(split_line[0])
idx = log_idx(line)
return (d, idx)
def log_idx(line: str) -> Optional[int]:
for idx, template in enumerate(order):
pattern = re.compile(template)
if pattern.search(line):
return idx
return None
if __name__ == "__main__":
main("/home/joe/Downloads/us-prod-startup-0896c.txt")
The startup process is broken down into the following categories, each with their own timings
mz_environmentd::run()
mz_environmentd::Listeners::serve()
]mz_adapter::coord::serve()
]
mz_adapter::catalog::open::Catalog::initialize_controller()
]mz_adapter::coord::Coordinator::bootstrap()
]
Note: these exact sections were introduced in https://github.com/MaterializeInc/materialize/pull/28978
Attached is the startup timings in prod for the customer with the largest catalog (old format).
There are two major differences with staging due to 0dt being enabled in staging but not prod:
mz_storage_usage_by_shard
is currently a builtin table that is backed by the storage usage collection collection in the durable catalog. The storage usage collection takes up 99% of the space in the durable catalog. The mz_storage_usage_by_shard
table has the following columns:
id uint8
shard_id text
size_bytes uint8
collection_timestamp timestamptz
In order to have correct billing logic, we need the following gaurantees:
collection_timestamp
should be close to the real time that the row was written to the table.collection_timestamp
should be less than or equal to the logical timestamp that the row was written. i.e. If I run SELECT * FROM mz_storage_usage_by_shard AS OF t
then I should see all rows that will ever exist with a collection_timestamp <= t
.collection_timestamp
should be monotonically increasing.The way we currently maintain these gaurantees is by first generating the collection_timestamp
via the timestamp oracle, and then generating the logical write timestamp via the timestamp oracle.
In order to speed up startup times, we'd like to remove the storage usage collection from the durable catalog. Here are some proposals for that in the order of my preference for each proposal:
Keep mz_storage_usage_by_shard
as a builtin table, but prevent anyone from ever migrating it and don't truncate it on startup. This seems like the simplest approach, however we tried to do the same thing with the query activity log and ultimately decided to make it a builtin source.
Convert mz_storage_usage_by_shard
to an append only builtin source. We'll have to prevent anyone from ever migrating this source to avoid accidently truncating it. Additionally we can update the bakground builtin source task so that it can accept a closure of the form Fn(Timestamp) -> Vec<(Row, Diff)>
instead of literal writes. The source would pass in the logical timestamp of a write to generate rows for the write. The caller would pass in a closure to set the collection_timestamp
column equal to the logical write timestamp.
Remove the collection_timestamp
column from the table and increase the retention window to 31 days. If anyone wants to know the collection timestamp, then they can use mz_now()
.
Separate the persist catalog shard into two shards, one for storage_usage
and one for everything else. We'll use txn-wal to read/write from/to the shards.
For all approaches we'll have a background task that occasionally removes entries older than 30 days.
- Keep
mz_storage_usage_by_shard
as a builtin table, but prevent anyone from ever migrating it and don't truncate it on startup. This seems like the simplest approach, however we tried to do the same thing with the query activity log and ultimately decided to make it a builtin source.
This is my preference as well, especially since our forensic analysis has indicated that making the query history a builtin source was likely simply a result of not realizing that we could have builtin tables that didn't need to be truncated on startup.
- Keep
mz_storage_usage_by_shard
as a builtin table, but prevent anyone from ever migrating it and don't truncate it on startup. This seems like the simplest approach, however we tried to do the same thing with the query activity log and ultimately decided to make it a builtin source.
This would also be my first preference, yes!
I collected some parsing, planning, and optimization timings from my staging account. I created 1000 tables of the form CREATE TABLE t{i} (a INT);
, 1000 views of the form CREATE VIEW v{i} AS SELECT SUM(a) FROM t{i}
, and 1000 indexes of the form CREATE DEFAULT INDEX ON v{i}
. The total startup time is 17 seconds. Of thost 17 seconds, we spend the following amount of time on the following activities:
So what is most of the time spent on? Your activities only seem to sum up to ~5 s of the 17 s total.
So what is most of the time spent on? Your activities only seem to sum up to ~5 s of the 17 s total.
I'm having trouble with logs getting dropped in my staging account, so I may have missed a couple of events, but here are the other large activities:
Feature request
During an upgrade the time spent on startup is all time contributing to downtime. We need to reduce startup times to reduce downtime.