Open jseldess opened 5 years ago
Or the primary key could contain all 3 columns but start with timestamp
. Whatever column you're partitioning on needs to be first, I believe.
There are actually a few open questions for me, and I'd like your perspectives, @bdarnell and @awoods187:
timestamp
in the primary key. Thoughts?Also wondering, for 1, if it might be better to use a different column in vehicle_location_histories
as the archival-partitioning key, something like present_location
as a boolean. Because with timestamp
, you don't really have a way to know whether the location is current or historical. A vehicle might go unused for a while, so picking some timestamp (or interval before now()
) as the archival decision will be inefficient in some cases.
Right.
Combining multiple kinds of partitioning is complicated. You have to nest one inside the other, probably making the PK (city, timestamp) and then performing archival partitioning separately in each geo-partition.
I think what you're describing with the present_location
boolean is something different from archival partitioning. I'd do what I think you're describing with two tables: a present-location table that doesn't attempt to retain any history and a history table with archival partitioning that doesn't attempt to worry about the present.
For #1, you still wouldn't want just timestamp because that would result in a hotspot. Either way, I would at least put vehicle id first. cc @bdarnell for additional thoughts
@jseldess re: present location - that is currently modeled on the vehicle itself as current_location
https://github.com/cockroachdb/movr/blob/master/models.py#L71 -- it's a string address at the moment, but we could change that to a lat long to tie out with the new locations table
Archival partitioning kind of inherently creates hotspots - if you put vehicle ID first then you no longer have the time-ordering that allows you to create partitions based on age. The general solution to this kind of hotspot is to add a shard_id to the key, but in archival partitioning that would require partitioning by a combination of list and range which we don't appear to currently support (and even if we did, this would be a lot of manual work).
I'm not sure archival partitioning makes a lot of sense without some sort of support for sharded timestamp indexes (either directly or through a generalization of partitioning to combine range and list types at once). As things stand now, you can only use archival partitioning when your write volume is low enough that you can handle the hotspot, but then are you going to have enough data that partitioning will make much of a difference?
I'm confused about the right scenario for demonstration purposes, but in the simplest scenario, with archival-partitioning by range via timestamp
, wouldn't load-based splitting help with the hotspot issue?
I just read this range partitioning caveat as well:
Partition values can be any SQL expression, but it’s only evaluated once. If you create a partition with value
< (now() - '1d')
on 2017-01-30, it would be contain all values less than 2017-01-29. It would not update the next day, it would continue to contain values less than 2017-01-29.
Doesn't this limitation make it pretty hard to use archival partitioning in a meaningful way? Wouldn't you want to archive data based on its age, and not based on some pre-defined date? I suppose you could update the archival partition with a new archival date every so often, but that seems burdensome.
Load-based splitting wouldn't help with the hotspot issue with a timestamp ordered index. The QPS / range would never be distributed with auto spits because we would always write to the top (or bottom) range in the timestamp sorted map.
It would help with a shard key partitioned schema though.
It sounds like this feature would be experimental or beta by our current standards. I'll add it to airtable as a placeholder. cc @awoods187
Doesn't this limitation make it pretty hard to use archival partitioning in a meaningful way? Wouldn't you want to archive data based on its age, and not based on some pre-defined date? I suppose you could update the archival partition with a new archival date every so often, but that seems burdensome.
Yes, archival partitioning currently requires you to periodically upgrade your partition boundaries.
It [load-based splitting] would help with a shard key partitioned schema though.
Not really - the partitioning would ensure that there are splits at the shard key boundaries, and then within each shard you're back to the single hotspot that can't usefully be split due to load. Load-based splitting would help for sharded timestamp keys without partitioning, though.
It sounds like this feature would be experimental or beta by our current standards.
Archival is a use case, the feature is partitioning by range. I agree that addressing archival needs with range-based partitioning is an experimental scenario. I'm not sure that this implies that range-based partitioning is also experimental, although I don't think we have any ideas for a non-time-based use of the feature.
Actually, archival partitioning may be two separate use cases, each experimental in their own ways: a need for efficiently dropping older data (which is probably better served by implementing row-based TTLs) and distinguishing hot and cold storage (the canonical example of this is to put hot data on SSD and cold on HDD, but with our current limits on ranges per store, I don't think we can make effective use of HDDs).
Not really - the partitioning would ensure that there are splits at the shard key boundaries, and then within each shard you're back to the single hotspot that can't usefully be split due to load. Load-based splitting would help for sharded timestamp keys without partitioning, though.
Right - I didn't mean partitioning
as in the feature. I just meant "splitting up" the timeseries data by virtue of having a shard key. Wrong word choice!
Thanks for the additional color @bdarnell -- we'll track this on the product side.
Sharding timestamp keys is (currently) an application-level workaround that our SEs and I have been recommending. It's not even an experimental feature yet, although I think we should think about building something that makes this easier (and is compatible with range-based partitioning, although this is much less of a concern for me than just making the basic non-partitioned functionality easier to discover and use).
@bdarnell, @nstewart, given the above, our current documentation is misleading: We highlight archival partitioning as a main use case, and the partition by range example focuses on this use case without calling out any of the limitations mentioned above.
What should we do? Update the docs to de-emphasize archival partitioning? Change the partition by range example to have another focus?
I also think it's probably important to inform SEs about this. I'm pretty sure they're selling various use cases for partitioning, including archival, even though nobody seems to have implemented it.
Yes, we should de-emphasize archival partitioning in the docs.
As an alternative use case for partitioning by range, maybe we could give an example with keys like "us-chicago" and "us-newyork" with partitions on the range "us" to "ut"? Other than that, I think archival partitioning is still the best use case for partitioning by range, even though it's only suitable for workloads where sharding isn't required to achieve the necessary write throughput.
I've been trying out archival partitioning, and when I try to partition the
vehicle_location_histories
table based ontimestamp
, I get this error:Looking at the table schema, the primary key includes
city
, thenride_id
, thentimestamp
:I don't understand why
city
andride_id
need to be in the primary key. If it were justtimestamp
, I think this would work fine. @nstewart, let me know what you were thinking.