Closed fatchat closed 1 month ago
@mdshamoon maybe take a look at this, for the next upgrade.
One of their developers, recommends using abctl
(airbyte command line tool) to install/manage airbyte locally.
i knew run-ab-platform
was deprecated but i thought they still ran on Docker
urgh
Its time we get into kubernetes
for our production setup also.
Created machine airbyte-upgrade
and RDS airbyte-upgrade
Installed Airbyte 0.50.44
Got it running
Let me know once you test the docker prefect-proxy
while setting up Dalgo here.
But how will you test the case where deployments are already there and we shift to a docker setup ?
Good point I'm not testing that... I'm only testing Airbyte, the docker setup for Dalgo is just to save myself some time
Dalgo is set up
org1
with a Postgres warehouse
syncs are running
org2
with a BigQuery warehouse, syncs are running
before upgrading Airbyte
abctl
requires Go
Following https://www.digitalocean.com/community/tutorials/how-to-install-go-on-ubuntu-20-04
curl -OL https://golang.org/dl/go1.23.0.linux-amd64.tar.gz
sudo tar -C /usr/local -xvf go1.23.0.linux-amd64.tar.gz
20 GB storage wasn't enough for the next step so I upgraded the volume to 40 GB before running
go install github.com/airbytehq/abctl@latest
abctl
has the following usage for installations
ubuntu@ip-172-31-21-134:~$ abctl local install --help
Install Airbyte locally
Usage:
abctl local install [flags]
Flags:
--chart-version string specify the Airbyte helm chart version to install (default "latest")
--docker-email string docker email, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_EMAIL
--docker-password string docker password, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_PASSWORD
--docker-server string docker registry, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_SERVER (default "https://index.docker.io/v1/")
--docker-username string docker username, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_EMAIL
-h, --help help for install
--host string ingress http host (default "localhost")
--insecure-cookies allow insecure cookies to be served over http
--low-resource-mode run Airbyte in low resource mode
--migrate migrate data from docker compose installation
--no-browser disable launching the web-browser post install
--port int ingress http port (default 8000)
--secret strings an Airbyte helm chart secret file
--values string the Airbyte helm chart values file to load
--volume strings additional volume mounts (format: <HOST_PATH>:<GUEST_PATH>)
The documentation for --migrate
says
Enables data-migration from an existing docker-compose backed Airbyte installation.
Copies, leaving the original data unmodified, the data from a docker-compose backed Airbyte installation into this abctl managed Airbyte installation.
ubuntu@ip-172-31-21-134:~$ abctl local status
INFO Thanks for using Airbyte!
Anonymous usage reporting is currently enabled. For more information, please see https://docs.airbyte.com/telemetry
INFO Using Kubernetes provider:
Provider: kind
Kubeconfig: /home/ubuntu/.airbyte/abctl/abctl.kubeconfig
Context: kind-airbyte-abctl
SUCCESS Found Docker installation: version 27.2.0
WARNING Airbyte does not appear to be installed locally
▄ Checking for existing Kubernetes cluster 'airbyte-abctl'
/home/ubuntu/.airbyte/abctl/abctl.kubeconfig
did not exist27.2.0
??It looks like we need to move to Kubernetes before upgrading
kapa.ai
says we can upgrade before moving to kubernetes or to abctl
. will attempt that now
https://raw.githubusercontent.com/airbytehq/airbyte/v0.58.0/run-ab-platform.sh
Error:
Error response from daemon: create /tmp/airbyte_local: "/tmp/airbyte_local" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path
Removed volume docker volume rm airbyte_workspace
Downloaded new .env
which sets LOCAL_DOCKER_MOUNT
to oss_local_root
instead of to /tmp/airbyte_local
Airbyte started successfully but the logs were gone. Maybe I shouldn't have removed the volume...
Dalgo started and the dashboard opened
12 Source Connectors to check
tech4dev/source-surveycto:0.1.1
tech4dev/source-commcare:0.1.0
tech4dev/source-commcare:0.2.0
tech4dev/source-kobotoolbox:0.2.0
tech4dev/source-kobotoolbox:0.2.1
airbyte/source-postgres:2.0.33
42 APIs to check
{
"workspaceId": "268fb5b3-ed64-4cdb-893e-38262eab61d8",
"customerId": "69f30da9-9778-4c04-bdcc-c72cd3c8f7d7",
"name": "org1",
"slug": "org1",
"initialSetupComplete": False,
"displaySetupWizard": False,
"anonymousDataCollection": False,
"news": False,
"securityUpdates": False,
"notifications": [],
"notificationSettings": {
"sendOnSuccess": {"notificationType": []},
"sendOnFailure": {"notificationType": ["customerio"]},
"sendOnSyncDisabled": {"notificationType": ["customerio"]},
"sendOnSyncDisabledWarning": {"notificationType": ["customerio"]},
"sendOnConnectionUpdate": {"notificationType": ["customerio"]},
"sendOnConnectionUpdateActionRequired": {
"notificationType": ["customerio"]
},
"sendOnBreakingChangeWarning": {"notificationType": ["customerio"]},
"sendOnBreakingChangeSyncsDisabled": {
"notificationType": ["customerio"]
},
},
"defaultGeography": "auto",
"webhookConfigs": [],
"organizationId": "00000000-0000-0000-0000-000000000000",
"tombstone": False,
}
create
now takes a second required parameter: organizationId
. this can be set to 00000000-0000-0000-0000-000000000000
{'sourceDefinitionId': 'a4617b39-3c14-44cd-a2eb-6e720f269235',
'name': 'Public Apis',
'dockerRepository': 'airbyte/source-public-apis',
'dockerImageTag': '0.2.0',
'documentationUrl': 'https://docs.airbyte.com/integrations/sources/public-apis',
'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/source-public-apis/latest/icon.svg',
'protocolVersion': '0.2.0',
'custom': False,
'supportLevel': 'community',
'releaseStage': 'alpha',
'sourceType': 'api',
'maxSecondsBetweenMessages': 10800}
{'connections': [{'connectionId': '756bbf71-b72f-429c-908b-c1836dd5c298',
'name': 'sync',
'namespaceDefinition': 'customformat',
'namespaceFormat': 'staging',
'prefix': '',
'sourceId': '05814f12-3b05-4541-97d2-7806fff39e94',
'destinationId': 'f710b11a-2dba-4448-9a9f-0afdcf8e0e89',
'operationIds': [],
'syncCatalog': {'streams': [{'stream': {'name': 'Sheet2',
'jsonSchema': {'type': 'object',
'$schema': 'http://json-schema.org/draft-07/schema#',
'properties': {'Iron': {'type': 'string'},
'SNo.': {'type': 'string'},
'Zone': {'type': 'string'},
'State': {'type': 'string'},
'Arsenic': {'type': 'string'},
'Nitrate': {'type': 'string'},
'Fluoride': {'type': 'string'},
'Latitude': {'type': 'string'},
'Multiple': {'type': 'string'},
'Physical': {'type': 'string'},
'salinity': {'type': 'string'},
'Longitude': {'type': 'string'},
'District Name': {'type': 'string'},
'Bacteriological': {'type': 'string'}}},
'supportedSyncModes': ['full_refresh'],
'defaultCursorField': [],
'sourceDefinedPrimaryKey': []},
'config': {'syncMode': 'full_refresh',
'cursorField': [],
'destinationSyncMode': 'overwrite',
'primaryKey': [],
'aliasName': 'Sheet2',
'selected': True,
'fieldSelectionEnabled': False}}]},
'scheduleType': 'manual',
'status': 'active',
'sourceCatalogId': 'cd539418-af6e-43f7-9d5c-02137efe5cb1',
'geography': 'auto',
'breakingChange': False,
'notifySchemaChanges': False,
'notifySchemaChangesByEmail': False,
'nonBreakingChangesPreference': 'ignore',
'created_at': 1725274314,
'backfillPreference': 'disabled'}]}
{'connections': [{'connectionId': '756bbf71-b72f-429c-908b-c1836dd5c298',
'name': 'sync',
'scheduleType': 'manual',
'status': 'active',
'source': {'sourceId': '05814f12-3b05-4541-97d2-7806fff39e94',
'name': 'source',
'sourceDefinitionId': '71607ba1-c0ac-4799-8049-7f4b90dd50f7',
'sourceName': 'Google Sheets',
'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/source-google-sheets/latest/icon.svg'},
'destination': {'destinationId': 'f710b11a-2dba-4448-9a9f-0afdcf8e0e89',
'name': 'postgres-warehouse',
'destinationDefinitionId': '25c5221d-dce2-4163-ade9-739ef790f503',
'destinationName': 'Postgres',
'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/destination-postgres/latest/icon.svg'},
'latestSyncJobCreatedAt': 1725849314,
'latestSyncJobStatus': 'succeeded',
'isSyncing': False,
'schemaChange': 'no_change'}]}
refreshing the schema of a google sheets source failed with
2024-09-10 17:52:38 platform > Preparing command: docker run --rm --init -i -w /data/97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4/0 --log-driver none --name source-google-sheets-discover-97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4-0-rfwvh --network host -v airbyte_workspace:/data -v oss_local_root:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-google-sheets:0.3.13 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.58.0 -e WORKER_JOB_ID=97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4 airbyte/source-google-sheets:0.3.13 discover --config source_config.json
2024-09-10 17:52:38 platform > Reading messages from protocol version 0.2.0
2024-09-10 17:52:40 platform > Running discovery on sheet 1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ
2024-09-10 17:52:40 platform > Backing off get(...) for 0.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:41 platform > Backing off get(...) for 1.4s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:43 platform > Backing off get(...) for 1.3s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:44 platform > Backing off get(...) for 0.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:46 platform > Backing off get(...) for 10.3s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:56 platform > Backing off get(...) for 4.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:53:01 platform > Backing off get(...) for 48.9s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&a
Looks like google server issue.
sheet2
& sheet22
overwrite
destination moderefresh source schema
and then accepted changes via Pending Actions
sheet2
sheet2
sheet2
) + sync after accepting the schema changesflows
, organizations
, profiles
. data
json column. first name
and reployed the form and tried to refresh the source schema from Dalgo but it didn't detect anything. Also tried from airbyte but didn't detect anything. Workspace id 6aefd963-058a-4291-83a5-20a590c9c382
and connection name Kobo v0.2.0
@fatchat .
This is because we dont have a defined schema and we put everything in data
Step 1: Look at Airbyte's releases and recommend a version to upgrade to
Step 2: Once the target version is agreed upon, list all changes which would break our integration
Step 3: Make a plan to address those changes