Open clintropolis opened 6 days ago
Discussion: should we move metric ingest/sink/count to ingest/appendableSegment/count or something similar? That is kind of disruptive since there is like a continuity of measurement issue which is kind of sad. We could also always double emit it, but also sad in its own way.
I think it is okay to deprecate the older metric but keep emitting it along with the new one for a couple of releases. We are already emitting a good number of metrics, one more is not going to hurt. But at least this way we would be able to eventually get rid of the old one.
Also, @clintropolis , what do you think about calling it RealtimeSegment
instead of AppendableSegment
?
"Realtime segment" is a term that Druid community is already familiar with.
"Appendable segment" seems slightly ambiguous as it could be misinterpreted by some to mean a segment to which more data can be added even after it has been committed.
Also, @clintropolis , what do you think about calling it RealtimeSegment instead of AppendableSegment? "Realtime segment" is a term that Druid community is already familiar with. "Appendable segment" seems slightly ambiguous as it could be misinterpreted by some to mean a segment to which more data can be added even after it has been committed.
My reasoning for AppendableSegment
was because they are built with an Appenderator
. Also "realtime" isn't really appropriate because they are also created during batch processing. I would be ok changing the metric name to something else, but I think the current name is more appropriate in code at least. I am open to other suggestions though, I considered 'mutable' segment but that is really only part of the AppendableSegment
which is composed of one mutable PartialSegment
and several other immutable intermediary persisted PartialSegment
(using the new naming).
Also "realtime" isn't really appropriate because they are also created during batch processing.
Yeah, I wondered if this was the case, didn't see that BatchAppenderator
itself is using sinks.
If I am not mistaken, a sink should always corresponding to a pending segment i.e. an entry in the pendingSegments
metadata table. So maybe PendingSegment
could be a better name?
If I am not mistaken, a sink should always corresponding to a pending segment i.e. an entry in the pendingSegments metadata table. So maybe
PendingSegment
could be a better name?
PendingSegment
doesn't seem quite right to me because it seems like an odd name for the role that this thing plays in code. I think my problem here is the word 'pending' which seems like something you wouldn't want to be using in stuff like query processing because it sounds like something that isn't 'official' yet - which makes sense for how it is used for task locks, but less so for the internal code stuff imo.
What don't you like about AppendableSegment
as an internal code construct name? Should we rename Appenderator
too? I don't think it would be useful for the docs to talk about them as this and agree it might be confusing, but I don't think the docs widely called them Sink
either since this is just internal naming stuff.
I think my problem here is the word 'pending' which seems like something you wouldn't want to be using in stuff like query processing because it sounds like something that isn't 'official' yet
Agreed. I have myself been on the fence about the 'pending' part too as it is certainly confusing. But then, I am not sure what a better alternative would have been either (maybe open
, appending
or appendable
itself 😛 ).
In that regard, I do like AppendableSegment
as it does clarify that this is a segment to which we can currently append more rows. I just wanted to be sure before we introduced a new(-ish) term in the code.
You seem to have given it enough thought and I do agree with the arguments against the other options, so let's go ahead with AppendableSegment
!
I don't think we need to rename Appenderator
as it works fine for its role.
changes:
FireHydrant
is nowPartialSegment
. This name much more clearly describes what this class does, and with all the other fireman terminology removed it didn't even fit a theme anymore.Sink
is nowAppendableSegment
. This name also much more clearly describes what this class does, and is composed ofPartialSegments
per the previousFireHydrant
rename.SinkQuerySegmentWalker
->AppendableSegmentQuerySegmentWalker
, andSinkQueryRunner
->AppendableSegmentQueryRunner
Firehose
,IngestSegmentFirehose
was only used by Hadoop indexingDruidRecordReader
, moved to internal class ofDruidRecordReader
asSegmentReader
FirehoseFactory
and remaining implementations, after #16602 they were no longer usedorg.apache.druid.segment.realtime.sink
andorg.apache.druid.segment.realtime.firehose
up toorg.apache.druid.segment.realtime
.Discussion: should we move metric
ingest/sink/count
toingest/appendableSegment/count
or something similar? That is kind of disruptive since there is like a continuity of measurement issue which is kind of sad. We could also always double emit it, but also sad in its own way.Release note
todo
This PR has: