Open abhishekrb19 opened 1 year ago
Some initial thoughts:
From the "operational impact" section, it sounds like there is no storage as part of this proposal, just syntax sugar on load/drop rules. Is this correct? If so:
GET
if there are some load/drop rules for a datasource that can't be mapped onto a storage policy object? Or, is it possible for all combinations of load/drop rules to be mapped onto storage policy?Additionally, did you consider options where the storage policy is a real object that is stored, perhaps in the new (& currently not-really-used) catalog? In that case the API would be through CatalogResource
. Curious what you see as the pros and cons of these two approaches.
Overall I really like the design.
Some thoughts and questions:
GET /druid/coordinator/v1/rules
that is needed for the console.Disallow retain if auto-kill configuration is disabled
I have issues with that. What if you set retain and then disable the auto-kill. How will the UI know if auto-kill is enabled or not (so as to know to render the retain
controls or not)What happens if hot
is not set is everything hot or nothing is hot? I think the load rule part of the example suggests that everything is hot?
Motivation
While Druid's rules (load, drop, and broadcast rules) and kill tasks are powerful, they can be complex to use and understand, especially in the context of retention. Druid users need to think about the lifecycle of segments (used/unused), map to tiered replicants, and add the appropriate imperative rules in the correct order to the rule chain.
Proposed changes
At a high level, users can define a storage policy for the hot tier (aka historical tier) and the deep storage. To that effect, introduce a storage policy API that translates user-defined policies to one or more load and drop rules under the hoods.
New API
/druid/coordinator/v1/storagePolicy/<dataSource>
The API will accept two parameters in the create payload:
hot
: Defines how long to keep the data in the hot tier(s) (aka historical tiers)retain
: Defines how long to retain the data before it's cleaned up permanently, including data from the deep storage and metadata storeTranslation of storage policy to load & drop rules
A few use cases along with the storage policy payloads and the corresponding internal load/drop rules is shown below:
in the hot tier and permanently delete
all data older than 30 days.
than 30 days from the hot tier.
than 60 days.
Extensibility & Maintainability
Similar to the above period-based policies, we can add interval-based and custom tiered-policies for more advanced users. For example: a. Interval-based policy:
b. Custom-tiered policy:
The API will need to translate user-defined storage policies to rules as we extend support to cover more complex use cases.
High-level implementation
The API implementation will support
POST
,GET
andDELETE
operations to create, retrieve and delete any configured storage policy per data source. Similar to therules
endpoint, this new endpoint should be on the coordinator and should return appropriate error/status codes to the user. The implementation of the API will:hot.period
cannot be larger thanretain.period
retain
if auto-kill configuration is disabledRationale
The main benefit of the API is that it abstracts away the complex inner workings of load, drop and kill rules. It provides a declarative interface to think about retention like many systems offer.
Operational impact
Since this API-only change leverages the existing load/drop rule functionality, nothing needs to be deprecated in short order. If it makes sense to deprecate the rules API at some point because the new API is equally powerful, then we may consider that.
Future work
In environments with multiple hot tiers, users must manually enumerate the tiers in the tieredReplicants if they use load rules. We can extend the storage policy API to automatically list all the tiers by default if it's not supplied.