Matatika / tap-google-sheets

tap-google-sheets, singer tap built with the Meltano SDK
GNU Affero General Public License v3.0
2 stars 9 forks source link

tap-google-sheets

test Code style: black GitHub license Python version

This Google Sheets tap produces JSON-formatted data following the Singer spec.

tap-google-sheets is a Singer tap for the Google Sheets API built with the Meltano Tap SDK for Singer Taps.


Configuration

A full list of supported settings and capabilities for this tap is available by running:

tap-google-sheets --about

Getting Your Credentials

OAuth

At Matatika, we have OAuth support for this tap. This means when you sign up and use this tap in one of our workspaces you can go through the Google OAuth flow, allowing the Matatika app access to your Google Sheet to sync data on your behalf.

Using the tap this way means you do not have to get any of the following credentials.

Client ID, Client Secret & Refresh Token

To get your google credentials we recommend reading and following the OAuth 2.0 Google API Documentation

The tap calls the following Google APIs, these need to be enabled in Google Cloud Console

Consent for these scopes needs to be supplied in required scopes during OAuth client creation and requested in your authorization flow.

https://www.googleapis.com/auth/spreadsheets.readonly https://www.googleapis.com/auth/drive.readonly

Sheet ID

Your sheet_id is also required to run --discover, as running this will build the streams schema based on your google sheet.

When you open your Google sheet, the url will look something like:

https://docs.google.com/spreadsheets/d/abc123/edit#gid=0

Your sheet_id are the characters after spreadsheets/d/, so in this case would be abc123.


Credentials

Setting Required Type Description
oauth_credentials.client_id Required String Your google client id
oauth_credentials.client_secret Required String Your google client secret
oauth_credentials.refresh_token Required String Your google refresh token
sheet_id Required String Your target google sheet id
output_name Optional String Optionailly rename the stream and output file or table from the tap
child_sheet_name Optional String Optionally choose a different sheet from your Google Sheet file
range Optional String Optionally choose a range of data from your Google Sheet file (defaults to the entire sheet)

Range is defined using A1 notation and is start/end inclusive. Examples:
  • B5:G45 - start at B5 and end at G45
  • A:T - start at A1 and end at the last cell of column T (same as A1:T and A:T1)
  • 3:5 - start at A3 and end at the last cell of row 5 (same as A3:5 and 3:A5)
  • D3:ZZZ - start at D3 and end at the last cell in the sheet
key_properties Optional Array of Strings Optionally choose primary key column(s) from your Google Sheet file. Example: ["column_one", "column_two"]
sheets Optional Array of Objects Optionally provide a list of configs for each sheet/stream. See "Per Sheet Config" below. Overrides the sheet_id provided at the root level.

Per Sheet Config

Setting Required Type Description
sheet_id Required String Your target google sheet id
output_name Optional String Optionailly rename the stream and output file or table from the tap
child_sheet_name Optional String Optionally choose a different sheet from your Google Sheet file
range Optional String Optionally choose a range of data from your Google Sheet file (defaults to the entire sheet)

Range is defined using A1 notation and is start/end inclusive. Examples:
  • B5:G45 - start at B5 and end at G45
  • A:T - start at A1 and end at the last cell of column T (same as A1:T and A:T1)
  • 3:5 - start at A3 and end at the last cell of row 5 (same as A3:5 and 3:A5)
  • D3:ZZZ - start at D3 and end at the last cell in the sheet
key_properties Optional Array of Strings Optionally choose primary key column(s) from your Google Sheet file. Example: ["column_one", "column_two"]

Environment Variable

These settings expand into environment variables of:


FAQ / Things to Note

Loaders Tested


Roadmap

Currently if have duplicate column names, a database will either:


Installation

Use pip to install a release from GitHub.

pip install git+https://github.com/Matatika/tap-google-sheets@vx.x.x

Usage

You can easily run tap-google-sheets by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-google-sheets --version
tap-google-sheets --help
tap-google-sheets --config CONFIG --discover > ./catalog.json

Note: to run --discover you need to have set the required tap settings found here.

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tap_google_sheets/tests subfolder and then run:

poetry run pytest

You can also test the tap-google-sheets CLI interface directly using poetry run:

poetry run tap-google-sheets --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in the file.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-google-sheets
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-google-sheets --version
# OR run a test `elt` pipeline:
meltano elt tap-google-sheets target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.