Patras3 / hazelcast

Open-source distributed computation and storage platform
https://www.hazelcast.com
Other
0 stars 0 forks source link

Hazelcast

Slack javadoc Docker pulls Quality Gate Status


What is Hazelcast

The world’s leading companies trust Hazelcast to modernize applications and take instant action on data in motion to create new revenue streams, mitigate risk, and operate more efficiently. Businesses use Hazelcast’s unified real-time data platform to process streaming data, enrich it with historical context and take instant action with standard or ML/AI-driven automation - before it is stored in a database or data lake.

Hazelcast is named in the Gartner Market Guide to Event Stream Processing and a leader in the GigaOm Radar Report for Streaming Data Platforms. To join our community of CXOs, architects and developers at brands such as Lowe’s, HSBC, JPMorgan Chase, Volvo, New York Life, and others, visit hazelcast.com.

When to use Hazelcast

Hazelcast provides a platform that can handle multiple types of workloads for building real-time applications.

Key Features

Operational Data Store

Hazelcast provides distributed in-memory data structures which are partitioned, replicated and queryable. One of the main use cases for Hazelcast is for storing a working set of data for fast querying and access.

The main data structure underlying Hazelcast, called IMap, is a key-value store which has a rich set of features, including:

Hazelcast stores data in partitions, which are distributed to all the nodes. You can increase the storage capacity by adding additional nodes, and if one of the nodes go down, the data is restored automatically from the backup replicas.

You can interact with maps using SQL or a programming language client of your choice. You can create and interact with a map as follows:

CREATE MAPPING myMap (name varchar EXTERNAL NAME "__key", age INT EXTERNAL NAME "this") 
TYPE IMap
OPTIONS ('keyFormat'='varchar','valueFormat'='int');
INSERT INTO myMap VALUES('Jake', 29);
SELECT * FROM myMap;

The same can be done programmatically as follows using one of the supported programming languages. Here are some exmaples in Java and Python:

var hz = HazelcastClient.newHazelcastClient();
IMap<String, Integer> map = hz.getMap("myMap");
map.set(Alice, 25);
import hazelcast

client = hazelcast.HazelcastClient()
my_map = client.get_map("myMap")
age = my_map.get("Alice").result()

Other programming languages supported are C#, C++, Node.js and Go.

Alternatively, you can ingest data directly from the many sources supported using SQL:

CREATE MAPPING csv_ages (name VARCHAR, age INT)
TYPE File
OPTIONS ('format'='csv',
    'path'='/data', 'glob'='data.csv');
SINK INTO myMap
SELECT name, age FROM csv_ages;

Hazelcast also provides additional data structures such as ReplicatedMap, Set, MultiMap and List. For a full list, refer to the distributed data structures section of the docs.

Stateful Data Processing

Hazelcast has a built-in data processing engine called Jet. Jet can be used to build both streaming and batch data pipelines that are elastic. You can use it to process large volumes of real-time events or huge batches of static datasets. To give a sense of scale, a single node of Hazelcast has been proven to aggregate 10 million events per second with latency under 10 milliseconds. A cluster of Hazelcast nodes can process billion events per second.

An application which aggregates millions of sensor readings per second with 10-millisecond resolution from Kafka looks like the following:

var hz = Hazelcast.bootstrappedInstance();

var p = Pipeline.create();

p.readFrom(KafkaSources.<String, Reading>kafka(kafkaProperties, "sensors"))
 .withTimestamps(event -> event.getValue().timestamp(), 10) // use event timestamp, allowed lag in ms
 .groupingKey(reading -> reading.sensorId())
 .window(sliding(1_000, 10)) // sliding window of 1s by 10ms
 .aggregate(averagingDouble(reading -> reading.temperature()))
 .writeTo(Sinks.logger());

hz.getJet().newJob(p).join();

Use the following command to deploy the application to the server:

bin/hazelcast submit analyze-sensors.jar

Jet supports advanced streaming features such as exactly-once processing and watermarks.

Data Processing using SQL

Jet also powers the SQL engine in Hazelcast which can execute both streaming and batch queries. Internally, all SQL queries are converted to Jet jobs.

CREATE MAPPING trades (
    id BIGINT,
    ticker VARCHAR,
    price DECIMAL,
    amount BIGINT)
TYPE Kafka
OPTIONS (
    'valueFormat' = 'json',
    'bootstrap.servers' = 'kafka:9092'
);
SELECT ticker, ROUND(price * 100) AS price_cents, amount
  FROM trades
  WHERE price * amount > 100;
+------------+----------------------+-------------------+
|ticker      |           price_cents|             amount|
+------------+----------------------+-------------------+
|EFGH        |                  1400|                 20|

Messaging

Hazelcast provides lightweight options for adding messaging to your application. The two main constructs for messaging are topics and queues.

Topics

Topics provide a publish-subscribe pattern where each message is fanned out to multiple subscribers. See the examples below in Java and Python:

var hz = Hazelcast.bootstrappedInstance();
ITopic<String> topic = hz.getTopic("my_topic");
topic.addMessageListener(msg -> System.out.println(msg));
topic.publish("message");
topic = client.get_topic("my_topic")

def handle_message(msg):
    print("Received message %s"  % msg.message)
topic.add_listener(on_message=handle_message)
topic.publish("my-message")

For examples in other languages, please refer to the docs.

Queues

Queues provide FIFO-semantics and you can add items from one client and remove from another. See the examples below in Java and Python:

var client = Hazelcast.newHazelcastClient();
IQueue<String> queue = client.getQueue("my_queue");
queue.put("new-item")
import hazelcast

client = hazelcast.HazelcastClient()
q = client.get_queue("my_queue")
my_item = q.take().result()
print("Received item %s" % my_item)

For examples in other languages, please refer to the docs.

Get Started

Follow the Getting Started Guide to install and start using Hazelcast.

Documentation

Read the documentation for in-depth details about how to install Hazelcast and an overview of the features.

Get Help

You can use the following channels for getting help with Hazelcast:

How to Contribute

Thanks for your interest in contributing! The easiest way is to just send a pull request. Have a look at the issues marked as good first issue for some guidance.

Building From Source

Building Hazelcast requires at minimum JDK 17. Pull the latest source from the repository and use Maven install (or package) to build:

$ git pull origin master
$ ./mvnw clean package -DskipTests

It is recommended to use the included Maven wrapper script. It is also possible to use local Maven distribution with the same version that is used in the Maven wrapper script.

Additionally, there is a quick build activated by setting the -Dquick system property that skips tests, checkstyle validation, javadoc and source plugins and does not build extensions and distribution modules.

Testing

Take into account that the default build executes thousands of tests which may take a considerable amount of time. Hazelcast has 3 testing profiles:

Some tests require Docker to run. Set -Dhazelcast.disable.docker.tests system property to ignore them.

When developing a PR it is sufficient to run your new tests and some related subset of tests locally. Our PR builder will take care of running the full test suite.

Trigger Phrases in the Pull Request Conversation

When you create a pull request (PR), it must pass a build-and-test procedure. Maintainers will be notified about your PR, and they can trigger the build using special comments. These are the phrases you may see used in the comments on your PR:

Where not indicated, the builds run on a Linux machine with Oracle JDK 17.

Creating PRs for Hazelcast SQL

When creating a PR with changes located in the hazelcast-sql module and nowhere else, you can label your PR with SQL-only. This will change the standard PR builder to one that will only run tests related to SQL (see run-sql-only above), which will significantly shorten the build time vs. the default PR builder. NOTE: this job will fail if you've made changes anywhere other than hazelcast-sql.

Creating PRs which contain only documentation

When creating a PR which changes only documentation (files with suffix .md or .adoc) it makes no sense to run tests. For that case the label docs-only can be used. The job will fail in case you've made other changes than in .md, .adoc or .txt files.

License

Source code in this repository is covered by one of two licenses:

  1. Apache License 2.0
  2. Hazelcast Community License

The default license throughout the repository is Apache License 2.0 unless the header specifies another license.

Acknowledgments

Thanks to YourKit for supporting open source software by providing us a free license for their Java profiler.

We owe (the good parts of) our CLI tool's user experience to picocli.

Copyright

Copyright (c) 2008-2024, Hazelcast, Inc. All Rights Reserved.

Visit www.hazelcast.com for more info.