Open rpalcolea opened 6 years ago
My first thought is that instead of changing the return value of runCycle
is to use the listener pattern like we have in HollowProducer
. Because we're not on Java 8 and don't have default methods on interfaces, we're kind of hamstrung on evolving HollowProducerListener
until Hollow 3 (even though HollowProducer
has the "Beta API" disclaimer, it's received wide enough usage at Netflix that we probably should remove that disclaimer and consider it released and public).
However, instead of using HollowProducerListener
on the composed HollowProducer
, there could be a separate HollowIncrementalProducerListener
. We've also been discussing having targeted listeners, e.g. ValidatorsListener
or some such, instead of having to shoehorn everything into HollowProducerListener
. We anticipate that HollowProducerListener
will survive into Hollow 3 for the highest level or the coarsest visibility, but for more detailed information we'd have fit-for-purpose listener interfaces.
With that context, maybe it isn't called HollowIncrementalProducerListener
but something like IncrementListener
or IncrementalCycleListener
or InsertBetterNameHere
. My only hard requirement is that we avoid the "delta" nomenclature as that gets confusing with also having the notion of state change deltas and delta blobs (eventually the "Delta-based Producer Input" will get retitled in the docs once we're ready to release the incremental producer).
access to the ReadState so the detail on added/modified gets lost, while you can determine how many were added or deleted based on cardinality, the modifications detail is not available or probably yes?
For each type in your data model, it's possible to reconstruct added, modified, and deleted from a ReadStateEngine
(i.e. readState.getStateEngine()
) using the methods getPreviousOrdinals()
and getPopulatedOrdinals()
in HollowTypeReadState
. Ordinals in previous that aren't in populated are either deletes or updates; ordinals in populated that aren't in previous are either adds or updates. You rely on ghost records to disambiguate. An ordinal in previous that isn't in populated that shares a primary key with an ordinal in populated that isn't in previous is an update otherwise it's a delete; the remaining ordinals in populated that aren't shared in previous and aren't modifications are adds.
only the incremental producer could know the number of adds/modified and deletes on a easy fashion because it's responsible for modifying the mutations object.
Sort of. The mutations should be thought of as requests to change, but any one of those requests could be noops. An addOrModify
of an object that is no different than what is already in the current state won't actually be an add nor an update. A delete
of a PK that isn't in the data won't actually be a delete.
However, having visibility into both what was requested and what actually happened is useful and could be used to spot bugs or inefficiencies when requested != actual.
I don't think a Validator should be implemented just for getting metrics on the dataset.
That's a useful workaround, but I agree it would be better if we had a designed solution.
Hi @toolbear ,
Thanks for the feedback. I like the idea of IncrementalCycleListener
.
What's the main motivation of making the HollowIncrementalProducer
something that contains a HollowProducer
. As of today, if you want to use the HollowIncrementalProducer
, you need to have an existing state. Have you ever discuss the pros/cons of making HollowIncrementalProducer
just another type of producer which just has another way of populate data? or let HollowIncrementalProducer
handle the first cycle?
I guess for HollowIncrementalProducer
would be ok to introduce the builder pattern now? and trigger the listeners at the end of runCycle
?
Also, do you envision HollowIncrementalCycleListener
as part of ListenerSupport
? I believe it should be ok, it could be something like listeners.fireIncrementalCycleComplete(IncrementalCycleStatus status)
and listeners.fireIncrementalCycleFail(IncrementalCycleStatus status)
where IncrementalCycleStatus
:
public class IncrementalCycleStatus {
private final long version;
private final Status status;
long recordsAddedOrModified
long recordsRemoved
}
And this status could be passed to the listener for logging/metrics/etc.
At the end, there should be a HollowIncrementalCycleListener
interface that extends EventListener
. This interface would include fireIncrementalCycleComplete
and fireIncrementalCycleFail
methods.
IncrementalCycleStatus
object could be introduced with proper success/fail builders.
HollowIncrementalProducer
's runCycle
method would trigger the listeners based on the cycle result... either fireIncrementalCycleFail
or fireIncrementalCycleComplete
passing a new IncrementalCycleStatus
.
HollowIncrementalProducer
would have a Builder pattern now to do something like:
HollowIncrementalProducer.withHollowProducer(myProducer)
.withThreadsPerCpu(2.0d) //for `SimultaneousExecutor`
.withIncrementalCycleListeners(mylisteners) //
Any thoughts/feedback would be appreciate it
@toolbear , Guess code would help more -> https://github.com/Netflix/hollow/pull/153
Hi @toolbear
While working with
HollowIncrementalProducer
, I found that it would be nice if the result ofrunCycle
had more information than theversion
. I believe users could use some stats likerecordsAddedOrModified
andrecordsRemoved
What are your thoughts on introducing a
HollowIncrementalProducerExecution
object that would look like this (orHollowIncrementalProducerCycleResult
?):This would help users to set some metrics or alerting based on the behavior. While
Validators
are a potential candidate for this there are two blockers for this:The validators are triggered by
Validator.validate
which is avoid
, so in case you need metrics, you need to bake your metrics or alerting into the validator (not a problem of course).Validator
only has access to theReadState
so the detail on added/modified gets lost, while you can determine how many were added or deleted based on cardinality, the modifications detail is not available or probably yes?, only the incremental producer could know the number of adds/modified and deletes on a easy fashion because it's responsible for modifying themutations
object.I don't think a
Validator
should be implemented just for getting metrics on the dataset.Don't know if this is something that you could see as useful for Hollow users. We do expose a lot of metrics, we currently track how many objects we add/modify/delete on each cycle. While the validators are helpful to fail a cycle if we want to drop for example... 3% of our data, the metrics are helping us to visualize how our dataset evolves over time.
thoughts?