Restructure pinot plugin-modules

kishoreg commented 4 years ago

Now that we have separated plugin implementations from pinot core code, let's re-organize the plugins and also change the package names in the plugin implementations.

CURRENT

pinot-spi
pinot-common
pinot-core
pinot-broker
pinot-server
pinot-controller
pinot-minion
pinot-perf
pinot-integration-tests
pinot-tools
pinot-record-readers
- pinot-avro
- pinot-csv
- pinot-json
- pinot-orc
- pinot-parquet
- pinot-thrift
pinot-azure-filesystem
pinot-hadoop-filesystem
pinot-connectors
- pinot-connector-kafka-0.9
- pinot-connector-kafka-2.0
- pinot-connector-kafka-base
pinot-batch-ingestion
- pinot-hadoop
- pinot-ingestion-common
- pinot-spark
- pinot-standalone

kishoreg commented 4 years ago

PROPOSAL

Club all plugins under separate module - pinot-plugins.

pinot-spi
pinot-common
pinot-core
pinot-broker
pinot-server
pinot-controller
pinot-minion
pinot-perf
pinot-integration-tests
pinot-tools
pinot-plugins
- pinot-record-readers (rename to pinot-format or pinot-input-format or pinot-input-reader, we have to move the decoders and encoders to this)
- pinot-avro
- pinot-csv
- pinot-json
- pinot-orc
- pinot-parquet
- pinot-thrift
- pinot-file-systems (rename to pinot-data-source or pinot-deep-storage or merge into pinot-connectors)
- pinot-azure-filesystem
- pinot-hadoop-filesystem
- pinot-connectors
- pinot-connector-kafka-0.9
- pinot-connector-kafka-2.0
- pinot-connector-kafka-base
pinot-batch-ingestion (rename to pinot-segment-creation-jobs) (Note: we dont have any plugin interface for these things)
- pinot-hadoop
- pinot-ingestion-common
- pinot-spark
- pinot-standalone (need a better name for this)

Thoughts?

jamesyfshao commented 4 years ago

Should pinot-batch-ingestion also under the pinot-plugin if we are going with this route? or maybe we could name it as something like third-party/ or contrib/ similar to other open source projects so we could group a bunch of other things under it.
Agree with pinot-record-recorder rename, either name would be good.
I am not sure if the pinot-file-systems fit under connector part. But I definitely get you on how it also functions similar to a "connector". Maybe the issue would be pinot-connector could use an even better naming?

kishoreg commented 4 years ago

we can move pinot-batch-ingestion under pinot-plugin. The only reason is we don't really have any interface for the segment creation job in pinot-spi. Maybe, we should move the job ingestion spec and ingestion job interface into pinot-spi?

mcvsubbu commented 4 years ago

I too had the same thought about moving ingestion into plugin, but then I am not sure if we can build independent ingestion mechanisms just by moving ingestion-spec and ingestion job interface into pinot-spi. If we can, then that is ok, I guess.

Another thought I had was whether to rename pinot-connectors to something that seems more related to realtime. pinot-stream-ingestion?

Jackie-Jiang commented 4 years ago

I would recommend moving pinot-batch-ingestion into pinot-plugins and rename pinot-connectors into pinot-stream-ingestion. They are responsible for data ingestion only

mayankshriv commented 4 years ago

If plugins are expected to have SPI interfaces, then probably to keep pinot-batch-ingestion outside of plugins. Also, agree to rename it pinot-segment-generation-jobs, as it is more about segment generation, and not about ingestion.

kishoreg commented 4 years ago

pinot-plugins
- pinot-input-format
- pinot-avro
- pinot-csv
- pinot-json
- pinot-orc
- pinot-parquet
- pinot-thrift
- pinot-file-system
- pinot-azure
- pinot-hadoop
- pinot-stream-ingestion
- pinot-connector-kafka-0.9
- pinot-connector-kafka-2.0
- pinot-connector-kafka-base
- pinot-batch-ingestion
- pinot-hadoop
- pinot-ingestion-common
- pinot-spark
- pinot-standalone

How does this look?

elonazoulay commented 4 years ago

Add pinot-gcs under pinot-file-system. What do you think about renaming pinot-connector-kafka-base to pinot-connector-kafka-message-decoders?

elonazoulay commented 4 years ago

Had a question about the pinot-input-format, why not keep the name pinot-record-readers since that is what those modules do?

kishoreg commented 4 years ago

@elonazoulay Decoders and record readers are typically associated with format of the input data (avro, parquet, thrift etc). RecordReader work on a file and used in batch mode while decoders work at a row level and used in streaming mode. There is scope for unifying and redesigning these interfaces in the future but its not the right time. Given this, my thinking was to keep everything related to the data format in one module.

connectors, on the other hand, depend on datasource type (kafka, eventhub, pubsub etc). connectors should be responsible for reading from any of these sources irrespective of the actual data format.

jamesyfshao commented 4 years ago

one small recommendation is that we might want to rename the two pinot-hadoop model because they might be confusing. Maybe pinot-hdfs (for filesystem component) or something like that?

kishoreg commented 4 years ago

@Jackie-Jiang @mcvsubbu @mayankshriv and I discussed and came up with the following structure

pinot-plugins
- pinot-input-format
- pinot-avro
- pinot-csv
- pinot-json
- pinot-orc
- pinot-parquet
- pinot-thrift
- pinot-file-system
- pinot-adls
- pinot-hdfs
- pinot-gcs
- pinot-stream-ingestion
- pinot-kafka-0.9
- pinot-kafka-2.0
- pinot-kafka-base
- pinot-batch-ingestion
- v0_deprecated
  - pinot-hadoop
  - pinot-ingestion-common
  - pinot-spark
- pinot-ingestion-base
- pinot-hadoop
- pinot-spark

I will be making these changes over tomorrow. Please comment if you have any feedback

apache / pinot

Restructure pinot plugin-modules #4941