Open jihoonson opened 5 years ago
This should be resolved after https://github.com/apache/incubator-druid/pull/7048.
How would this apply to the delegating implementations like CombiningFirehoseFactory, ClippedFirehoseFactory, and FixedCountFirehoseFactory? I don't know to what degree they are actually used, but Clipped seems like something that could be useful with the LocalFirehoseFactory and local index task. Would these need to implement FiniteFirehoseFactory too?
Good question. ClippedFirehoseFactory is for Tranquility, so I don't think it needs to be finiteFirehoseFactory. I'm not sure who is using FixedCountFirehoseFactory, but it was added in https://github.com/apache/incubator-druid/pull/3856 and looks its purpose was testing.
For CombiningFirehoseFactory, I think it would be useful and worth to add CombiningFiniteFirehoseFactory
which supports split.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
Motivation
Currently native batch tasks (local and parallel index tasks) support any firehose implementation. However, it isn't very useful when firehose is an infinite one because they don't have any context about stream ingestion.
Proposed changes
I propose to change the type of
firehose
ofIndexIOConfig
andParallelIndexIOConfig
fromFirehoseFactory
toFiniteFirehoseFactory
.Rationale
FiniteFirehoseFactory
is designed for any type of batch ingestion. It assumes that input data is finite (and provides an optional hint for parallel indexing). It makes more sense to support onlyFiniteFirehoseFactory
for native batch tasks rather than improve them to support any kind of firehoseFactory which may be designed for stream input data.Operational impact
There's no change in the task spec because the variable name isn't changed.
Custom firehoseFactory implementations for native batch tasks need to be updated.
Future work
This change effectively makes native batch tasks to support only text file formats by default because all implementations of
FiniteFirehoseFactory
are usingStringInputRowParser
. https://github.com/apache/incubator-druid/issues/5584 should be solved to support various file formats.