Closed vongosling closed 4 years ago
I am eagerly looking forward to the startup of the GSoC 2020. This is a preliminary proposal, we could call for discussion in the dev mail list :-)
How about RocketMQ Connect Spark、Hbase and Hive?
@qqeasonchen you could get the spark integration(but not using openconnect spec)in this repo. Would you like to mentor the hbase and hive connect, I would like to help you and co-mentor these topics :-)
@vongosling sure, i'd like to try.
@qqeasonchen That's' wonderful. would you like to share your topic with the hat of the mentor? you could proposal as the keda example which we have list here.
Apache RocketMQ Scaler for KEDA
Context
KEDA allows for fine-grained autoscaling (including to/from zero) for event-driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition. KEDA has a number of “scalers” that can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source. In this topic, you need to implement the RocketMQ scalers.
You should learn before applying for this topic
Helm/Apache RocketMQ Operator/Apache RocketMQ Docker Image Apache RocketMQ multi-replica mechanism(based on DLedger) How KEDA works
Mentor
wlliqipeng@apache.org, vongosling@apache.org
Apache RocketMQ Connect Flink
Context
There are many ways that Apache Flink and Apache RocketMQ can integrate to provide elastic data processing at a large scale. RocketMQ can be used as a streaming source and streaming sink in Flink DataStream applications, which is the main implementation and popular usage in RocketMQ community. Developers can ingest data from RocketMQ into a Flink job that makes computations and processes real-time data, to then send the data back to a RocketMQ topic as a streaming sink. More details you could see from https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink.
With more and more DW or OLAP engineers using RocketMQ for their data processing work, another potential integration needs arose. Developers can take advantage of as both a streaming source and a streaming table sink for Flink SQL or Table API queries. Also, Flink 1.9.0 makes the Table API a first-class citizen. It's time to support SQL in RocketMQ. This is the topic for Apache RocketMQ connect Flink.
You should learn before applying for this topic
Apache RocketMQ Flink Connector Apache Flink Table API
Extension
For some expertise students in the streaming field, you could continue to implements and provides an exactly-once streaming source and at-least-once(or exactly-once)streaming sink, like the issue https://github.com/apache/rocketmq-externals/issues/500 said.
Mentor
nicholasjiang@apache.org, duhengforever@apache.org, vongosling@apache.org
Apache RocketMQ Connect Hudi
Context
Hudi could ingest and manage the storage of large analytical datasets over DFS (hdfs or cloud stores). It can act as either a source or sink for streaming processing platform such as Apache RocketMQ. it also can be used as a state store inside a processing DAG (similar to how rocksDB is used by Flink). This is an item on the roadmap of the Apache RocketMQ. This time, you should implement a fully hudi source and sink based on RocketMQ connect framework, which is a most important implementation of the OpenConnect.
You should learn before applying for this topic
Apache RocketMQ Connect Framework Apache Hudi .
Mentor
vongosling@apache.org, duhengforever@apache.org
Apache RocketMQ Ingestion for Druid
Context
Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. In this topic, you should develop the RocketMQ indexing service enables the configuration of supervisors on the Overlord, which facilitate ingestion from RocketMQ by managing the creation and lifetime of RocketMQ indexing tasks. These indexing tasks read events using RocketMQ's own partition and offset mechanism. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained.
You should learn before applying for this topic
Apache Druid Data Ingestion
Mentor
vongosling@apache.org, duhengforever@apache.org
Apache RocketMQ Channel for Knative
Context
Knative is a kubernetes based platform for building, deploying and managing modern serverless applications. Knative to provide a set of middleware components that are essential to building modern, source-centric, and container-based applications that can run anywhere: on-premises, in the cloud, or even in a third-party data centre. Knative consists of the Serving and Eventing components. Eventing is a system that is designed to address a common need for cloud-native development and provides composable primitives to enable late-binding event sources and event consumers. Eventing also defines an event forwarding and persistence layer, called a Channel. Each channel is a separate Kubernetes Custom Resource. This topic requires you to implement rocketmqchannel based on Apache RocketMQ.
You should learn before applying for this topic
How Knative works RocketMQSource for Knative Apache RocketMQ Operator
Mentor
wlliqipeng@apache.org, vongosling@apache.org
CloudEvents support for RocketMQ
Context
Events are everywhere. However, event producers tend to describe events differently.
The lack of a common way of describing events means developers must constantly re-learn how to consume events. This also limits the potential for libraries, tooling and infrastructure to aide the delivery of event data across environments, like SDKs, event routers or tracing systems. The portability and productivity we can achieve from event data is hindered overall.
CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems. RocketMQ as an event streaming platform, also hopes to improve the interoperability of different event platforms by being compatible with the CloudEvents standard and supporting CloudEvents SDK. In this topic, you need to improve the binding spec. and implement the RocketMQ CloudEvents SDK(Java、Golang or others).
You should learn before applying for this topic
Apache RocketMQ/Apache RocketMQ SDK/CloudEvents
Mentor
duhengforever@apache.org, vongosling@apache.org
Schema registry
Content
In order to help RocketMQ improve its event management capabilities, and at the same time better decouple the producer and receiver, keep the event forward compatible, so we need a service for event metadata management is called a schema registry.
Schema registry will provide a GraphQL interface for developers to define standard schemas for their events, share them across the organization and safely evolve them in a way that is backward compatible and future proof.
You should learn before applying for this topic
Apache RocketMQ/Apache RocketMQ SDK/
Mentor
duhengforever@apache.org, vongosling@apache.org
Apache RocketMQ CLI Admin Tool Developed by Golang
Apache rocketmq provides a cli admin tool developed by Java to querying, managing and diagnosing various problems. At the same time, it also provides a set of API interface, which can be called by Java application program to create, delete, query, message query and other functions. This topic requires the realization of CLI management tool and a set of API interface developed by golang language, through which go application can realize the creation, query and other operations of topic.
You should learn before applying for this topic
Apache RocketMQ Apache RocketMQ Go Client
Mentor
wlliqipeng@apache.org, vongosling@apache.org
RocketMQ Connect Elasticsearch
Content
The Elasticsearch sink connector allows moving data from Apache RocketMQ to Elasticsearch 6.x, and 7.x. It writes data from a topic in Apache RocketMQ to an index in Elasticsearch and all data for a topic have the same type.
Elasticsearch is often used for text queries, analytics and as an key-value store (use cases). The connector covers both the analytics and key-value store use cases.
For the analytics use case, each message is in RocketMQ is treated as an event and the connector uses topic+message queue+offset as a unique identifier for events, which then converted to unique documents in Elasticsearch. For the key-value store use case, it supports using keys from RocketMQ messages as document ids in Elasticsearch and provides configurations ensuring that updates to a key are written to Elasticsearch in order.
So, in this project, you need to implement a sink connector based on OpenMessaging connect API, and will executed on RocketMQ connect runtime.
You should learn before applying for this topic
Elasticsearch/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
duhengforever@apache.org, vongosling@apache.org
RocketMQ Connect IoTDB
Content
The IoTDB sink connector allows moving data from Apache RocketMQ to IoTDB. It writes data from a topic in Apache RocketMQ to IoTDB.
IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its lightweight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input and complex data analysis in the industrial IoTDB field.
In this project, there are some update operations for historical data, so it is necessary to ensure the sequential transmission and consumption of data via RocketMQ. If there is no update operation in use, then there is no need to guarantee the order of data. IoTDB will process these data which may be disorderly.
So, in this project, you need to implement an IoTDB sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime.
You should learn before applying for this topic
IoTDB/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
duhengforever@apache.org, wlliqipeng@apache.org, vongosling@apache.org
The Operator for RocketMQ Exporter
The exporter exposes the endpoint of monitoring data collection to Prometheus server in the form of HTTP service. Prometheus server can obtain the monitoring data to be collected by accessing the endpoint endpoint provided by the exporter. RocketMQ exporter is such an exporter. It first collects data from rocketmq cluster, and then normalizes the collected data to meet the requirements of Prometheus system with the help of the third-party client library provided by Prometheus. Prometheus regularly pulls data from the exporter. This topic needs to implement an operator of rocketmq exporter to facilitate the deployment of the exporter in kubenetes platform.
You should learn before applying for this topic
RocketMQ-Exporter Repo RocketMQ-Exporter Overview Kubetenes Operator RocketMQ-Operator
Mentor
wlliqipeng@apache.org, vongosling@apache.org
RocketMQ Connect InfluxDB
Content
The InfluxDB sink connector allows moving data from Apache RocketMQ to InfluxDB. It writes data from a topic in Apache RocketMQ to InfluxDB. While The InfluxDB source connector is used to export data from InfluxDB Server to RocketMQ.
In this project, you need to implement an InfluxDB sink connector(source connector is optional) based on OpenMessaging connect API.
You should learn before applying for this topic
InfluxDB/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
duhengforever@apache.org, wlliqipeng@apache.org, vongosling@apache.org
RocketMQ Connect Cassandra
Content
The Cassandra sink connector allows writing data to Apache Cassandra. In this project, you need to implement a Cassandra sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime
You should learn before applying for this topic
Cassandra/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
duhengforever@apache.org, vongosling@apache.org
RocketMQ Connect Hbase
Content
The Hbase sink connector allows moving data from Apache RocketMQ to Hbase. It writes data from a topic in RocketMQ to a table in the specified HBase instance. Auto-creation of tables and the auto-creation of column families are also supported.
So, in this project, you need to implement an Hbase sink connector based on OpenMessaging connect API, and will execute on RocketMQ connect runtime.
You should learn before applying for this topic Hbase/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
chenguangsheng@apache.org, vongosling@apache.org
RocketMQ Connect Hive
Content
The Hive sink connector allows you to export data from Apache RocketMQ topics to HDFS files in a variety of formats and integrates with Hive to make data immediately available for querying with HiveQL. The connector periodically polls data from RocketMQ and writes them to HDFS.
The data from each RocketMQ topic is partitioned by the provided partitioner and divided into chunks. Each chunk of data is represented as an HDFS file with topic, queueName, start and end offsets of this data chunk in the filename.
So, in this project, you need to implement a Hive sink connector based on OpenMessaging connect API, and run it on RocketMQ connect runtime.
You should learn before applying for this topic Hive/Apache RocketMQ/Apache RocketMQ Connect/ OpenMessaging Connect API
Mentor
chenguangsheng@apache.org, vongosling@apache.org