GoogleCloudPlatform / datashare-toolkit

DIY commercial datasets on Google Cloud Platform
Apache License 2.0
88 stars 25 forks source link
bigquery fsi gcp gcp-cloud-functions gcp-marketplace-listing gcp-pubsub gcp-storage google-cloud google-cloud-platform google-cloud-pubsub google-cloud-storage google-marketplace marketplace pubsub sharing sharing-data sharing-economy sharing-information sharing-platform

Datashare Toolkit

Datashare

DIY commercial datasets on Google Cloud Platform

This is not an officially supported Google product.

The Datashare Toolkit is a solution for data publishers to easily manage datasets residing within BigQuery. The toolkit includes functionality to ingest and entitle data, relieving consumers from much of the toil involved in onboarding datasets from a variety of providers. Publishers upload data files to a storage bucket and allocate permissioned datasets for their consumers to use with BigQuery authorized views.

While these tools are used for data management and entitlement, they follow a bring-your-own-license (BYOL) for entitling publisher data. Hence, publishers should already have licensing arrangements for those consumers withing to access their data within GCP, and the consumers can furnish the GCP account ID's corresponding to their entitled user principals. These account IDs are required for the creation of the authorized views.

The toolkit is open-source. Some supporting infrastructure, such as storage buckets, serverless functions, and BigQuery datasets, must be maintained within GCP by publishers as a prerequisite. As a consumer, when the GCP accounts are added to the publisher entitlements, the published can be queried directly within BigQuery, ready to integrate into your analytics workflow, machine learning model, or runtime application. Publishers are responsible for managing the limited support infrastructure necessary. While consumers are billed for BigQuery compute and networking, publishers incur costs only on the storage of their data in BigQuery and Cloud Storage.

Key Features

Getting started with Datashare

If you plan to use GCP Marketplace integration, the production project that you install and manage Datashare from must follow the required naming convention (punctuation and spaces not allowed): [yourcompanyname]-public.

  1. Install Datashare
  2. Initialize Schema

Then get started, see the User Guide for usage information.

Requirements

Publishers

Consumers

Architecture

Architecture

Disclaimers

This is not an officially supported Google product.

Datashare is under active development. Interfaces and functionality may change at any time.

License

This repository is licensed under the Apache 2 license (see LICENSE).

Contributions are welcome. See CONTRIBUTING for more information.