Open kishoreg opened 4 years ago
PROPOSAL
Club all plugins under separate module - pinot-plugins.
Thoughts?
Should pinot-batch-ingestion also under the pinot-plugin if we are going with this route? or maybe we could name it as something like third-party/ or contrib/ similar to other open source projects so we could group a bunch of other things under it.
Agree with pinot-record-recorder rename, either name would be good.
I am not sure if the pinot-file-systems fit under connector part. But I definitely get you on how it also functions similar to a "connector". Maybe the issue would be pinot-connector could use an even better naming?
we can move pinot-batch-ingestion under pinot-plugin. The only reason is we don't really have any interface for the segment creation job in pinot-spi. Maybe, we should move the job ingestion spec and ingestion job interface into pinot-spi?
I too had the same thought about moving ingestion into plugin, but then I am not sure if we can build independent ingestion mechanisms just by moving ingestion-spec and ingestion job interface into pinot-spi. If we can, then that is ok, I guess.
Another thought I had was whether to rename pinot-connectors to something that seems more related to realtime. pinot-stream-ingestion?
I would recommend moving pinot-batch-ingestion
into pinot-plugins
and rename pinot-connectors
into pinot-stream-ingestion
. They are responsible for data ingestion only
If plugins are expected to have SPI interfaces, then probably to keep pinot-batch-ingestion
outside of plugins. Also, agree to rename it pinot-segment-generation-jobs
, as it is more about segment generation, and not about ingestion.
How does this look?
Add pinot-gcs under pinot-file-system. What do you think about renaming pinot-connector-kafka-base to pinot-connector-kafka-message-decoders?
Had a question about the pinot-input-format, why not keep the name pinot-record-readers since that is what those modules do?
@elonazoulay Decoders and record readers are typically associated with format of the input data (avro, parquet, thrift etc). RecordReader work on a file and used in batch mode while decoders work at a row level and used in streaming mode. There is scope for unifying and redesigning these interfaces in the future but its not the right time. Given this, my thinking was to keep everything related to the data format in one module.
connectors, on the other hand, depend on datasource type (kafka, eventhub, pubsub etc). connectors should be responsible for reading from any of these sources irrespective of the actual data format.
one small recommendation is that we might want to rename the two pinot-hadoop model because they might be confusing. Maybe pinot-hdfs (for filesystem component) or something like that?
@Jackie-Jiang @mcvsubbu @mayankshriv and I discussed and came up with the following structure
I will be making these changes over tomorrow. Please comment if you have any feedback
Now that we have separated plugin implementations from pinot core code, let's re-organize the plugins and also change the package names in the plugin implementations.
CURRENT