apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
681 stars 208 forks source link
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere

Apache Gravitino (incubating)

GitHub Actions Build GitHub Actions Integration Test License Contributors Release Open Issues Last Committed OpenSSF Best Practices

Introduction

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets.

Gravitino Architecture

Gravitino aims to provide several key features:

Contributing to Apache Gravitino

Gravitino is open source software available under the Apache 2.0 license. For information on how to contribute to Gravitino please see the Contribution guidelines.

Online documentation

You can find the latest Gravitino documentation in the doc folder. This README file only contains basic setup instructions.

Building Apache Gravitino

You can build Gravitino using Gradle. Currently you can build Gravitino on Linux and macOS, Windows isn't supported.

To build Gravitino, please run:

./gradlew clean build -x test

If you want to build a distribution package, please run:

./gradlew compileDistribution -x test

to build a distribution package.

Or:

./gradlew assembleDistribution -x test

to build a compressed distribution package.

The directory distribution contains the generated binary distribution package.

For the details of building and testing Gravitino, please see How to build Gravitino.

Quick start

Configure and start the Apache Gravitino server

If you already have a binary distribution package, go to the directory of the decompressed package.

Before starting the Gravitino server, please configure the Gravitino server configuration file. The configuration file, gravitino.conf, is in the conf directory and follows the standard property file format. You can modify the configuration within this file.

To start the Gravitino server, please run:

./bin/gravitino.sh start

To stop the Gravitino server, please run:

./bin/gravitino.sh stop

Alternatively, to run the Gravitino server in frontend, please run:

./bin/gravitino.sh run

And press CTRL+C to stop the Gravitino server.

Using Trino with Apache Gravitino

Gravitino provides a Trino connector to access the metadata in Gravitino. To use Trino with Gravitino, please follow the trino-gravitino-connector doc.

Development guide

  1. How to build Gravitino
  2. How to test Gravitino
  3. How to publish Docker images

License

Gravitino is under the Apache License Version 2.0, See the LICENSE for the details.

ASF Incubator disclaimer

Apache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache®, Apache Gravitino™, Apache Hadoop®, Apache Hive™, Apache Iceberg™, Apache Kafka®, Apache Spark™, Apache Submarine™, Apache Thrift™ and Apache Zeppelin™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.