MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.65k stars 459 forks source link
data-warehouse database distributed-systems kafka materialized-view operational-data-warehouse postgresql postgresql-dialect rust sql stream-processing streaming streaming-data

Build status Doc reference Chat on Slack

Materialize is a cloud-native data warehouse purpose-built for operational workloads where an analytical data warehouse would be too slow, and a stream processor would be too complicated.

Using SQL and common tools in the wider data ecosystem, Materialize allows you to build real-time automation, engaging customer experiences, and interactive data products that drive value for your business while reducing the cost of data freshness.

Get started

Ready to try out Materialize? Sign up to get started! 🚀

Have questions? We'd love to hear from you:

About

Materialize is designed to help you interactively explore your streaming data, perform analytics against live relational data, or increase data freshness while reducing the load of your dashboard and monitoring tasks. The moment you need a refreshed answer, you can get it in milliseconds.

It focuses on providing correct and consistent answers with minimal latency, and does not ask you to accept either approximate answers or eventual consistency. Whenever Materialize answers a query, that answer is the correct result on some specific (and recent) version of your data. Materialize does all of this by recasting your SQL queries as dataflows, which can react efficiently to changes in your data as they happen.

Our fully managed service is cloud native, featuring high availability, via multi-active replication, horizontal scalability, by seamlessly scaling dataflows across multiple machines, and near infinite storage, by leveraging cloud object storage (e.g., Amazon S3).

We support a large fraction of PostgreSQL, and are actively working on supporting more built-in PostgreSQL functions. Please file an issue if something doesn't work as expected!

Get data in

Materialize can read data from Kafka (and other Kafka API-compatible systems like Redpanda), directly from a PostgreSQL replication stream, or from SaaS applications via webhooks. It also supports regular database tables to which you can insert, update, and delete rows.

Transform, manipulate, and read your data

Once you've got the data in, define views and perform reads via the PostgreSQL protocol. Use your favorite SQL client, including the psql you probably already have on your system.

Materialize supports a comprehensive variety of SQL features, all using the PostgreSQL dialect and protocol:

Just show us what it can do!

Here's an example join query that works fine in Materialize, TPC-H query 15:

-- Views define commonly reused subqueries.
CREATE VIEW revenue (supplier_no, total_revenue) AS
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1 - l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= DATE '1996-01-01'
        AND l_shipdate < DATE '1996-01-01' + INTERVAL '3' month
    GROUP BY
        l_suppkey;

-- The MATERIALIZED keyword is the trigger to begin
-- eagerly, consistently, and incrementally maintaining
-- results that are stored directly in durable storage.
CREATE MATERIALIZED VIEW tpch_q15 AS
  SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            max(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

-- Creating an index keeps results always up to date and in memory.
-- In this example, the index will allow for fast point lookups of
-- individual supply keys.
CREATE INDEX tpch_q15_idx ON tpch_q15 (s_suppkey);

Stream inserts, updates, and deletes on the underlying tables (lineitem and supplier), and Materialize keeps the materialized view incrementally updated. You can type SELECT * FROM tpch_q15 and expect to see the current results immediately!

Get data out

Pull based: Use any PostgreSQL-compatible driver in any language/environment to make SELECT queries against your views. Tell them they're talking to a PostgreSQL database, they don't ever need to know otherwise.

Push based: Listen to changes directly using SUBSCRIBE or configure Materialize to stream results to a Kafka topic as soon as the views change.

Documentation

Check out our documentation.

License

Materialize is provided primarily as a fully managed cloud service with credit-based pricing. Included in the price are proprietary cloud-native features like horizontal scalability, high availability, and a web management console.

However, we're big believers in advancing the frontier of human knowledge. To that end, the source code of the standalone database engine is publicly available, in this repository, and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license after 4 years. As stated in the BSL, use of the standalone database engine on a single node is free forever. Please be warned that this deployment model is not suitable for production use and we cannot offer support for it.

Materialize depends upon many open source Rust crates. We maintain a list of these crates and their licenses, including links to their source repositories.

For developers

Materialize is primarily written in Rust.

Developers can find docs at doc/developer, and Rust API documentation is hosted at https://dev.materialize.com/api/rust/. The Materialize development roadmap is divided up into roughly month-long milestones, and managed in GitHub.

Contributions are welcome. Prospective code contributors might find the D-good for external contributors label useful. See CONTRIBUTING.md for additional guidance.

Credits

Materialize is lovingly crafted by a team of developers and one bot. Join us.