DalgoT4D / maintenance

maintenance scripts for our ec2 machines
0 stars 0 forks source link

Plan the next Airbyte upgrade #28

Closed fatchat closed 1 month ago

fatchat commented 3 months ago

Step 1: Look at Airbyte's releases and recommend a version to upgrade to

Step 2: Once the target version is agreed upon, list all changes which would break our integration

Step 3: Make a plan to address those changes

Ishankoradia commented 2 months ago

https://airbytehq.slack.com/archives/C01AHCD885S/p1723584049729239?thread_ts=1721655918.685019&cid=C01AHCD885S

@mdshamoon maybe take a look at this, for the next upgrade.

One of their developers, recommends using abctl (airbyte command line tool) to install/manage airbyte locally.

mdshamoon commented 2 months ago

https://docs.google.com/document/d/1GRmT5MtB3Ds0QXLrfMr45_IFsZ4R_vhyMvXiRJYW7bI/edit

Ishankoradia commented 2 months ago
Screenshot 2024-08-24 at 21 55 25
fatchat commented 2 months ago

i knew run-ab-platform was deprecated but i thought they still ran on Docker

urgh

Ishankoradia commented 2 months ago

Its time we get into kubernetes for our production setup also.

fatchat commented 2 months ago

Created machine airbyte-upgrade and RDS airbyte-upgrade Installed Airbyte 0.50.44 Got it running

Ishankoradia commented 2 months ago

Let me know once you test the docker prefect-proxy while setting up Dalgo here.

But how will you test the case where deployments are already there and we shift to a docker setup ?

fatchat commented 2 months ago

Good point I'm not testing that... I'm only testing Airbyte, the docker setup for Dalgo is just to save myself some time

fatchat commented 2 months ago

Dalgo is set up org1 with a Postgres warehouse syncs are running

fatchat commented 2 months ago

org2 with a BigQuery warehouse, syncs are running

fatchat commented 2 months ago

before upgrading Airbyte

fatchat commented 2 months ago

abctl requires Go

Following https://www.digitalocean.com/community/tutorials/how-to-install-go-on-ubuntu-20-04

curl -OL https://golang.org/dl/go1.23.0.linux-amd64.tar.gz

sudo tar -C /usr/local -xvf go1.23.0.linux-amd64.tar.gz

fatchat commented 2 months ago

20 GB storage wasn't enough for the next step so I upgraded the volume to 40 GB before running

go install github.com/airbytehq/abctl@latest

abctl has the following usage for installations

ubuntu@ip-172-31-21-134:~$ abctl local install --help
Install Airbyte locally

Usage:
  abctl local install [flags]

Flags:
      --chart-version string     specify the Airbyte helm chart version to install (default "latest")
      --docker-email string      docker email, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_EMAIL
      --docker-password string   docker password, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_PASSWORD
      --docker-server string     docker registry, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_SERVER (default "https://index.docker.io/v1/")
      --docker-username string   docker username, can also be specified via ABCTL_LOCAL_INSTALL_DOCKER_EMAIL
  -h, --help                     help for install
      --host string              ingress http host (default "localhost")
      --insecure-cookies         allow insecure cookies to be served over http
      --low-resource-mode        run Airbyte in low resource mode
      --migrate                  migrate data from docker compose installation
      --no-browser               disable launching the web-browser post install
      --port int                 ingress http port (default 8000)
      --secret strings           an Airbyte helm chart secret file
      --values string            the Airbyte helm chart values file to load
      --volume strings           additional volume mounts (format: <HOST_PATH>:<GUEST_PATH>)

The documentation for --migrate says

Enables data-migration from an existing docker-compose backed Airbyte installation.

Copies, leaving the original data unmodified, the data from a docker-compose backed Airbyte installation into this abctl managed Airbyte installation.
fatchat commented 2 months ago
ubuntu@ip-172-31-21-134:~$ abctl local status
  INFO    Thanks for using Airbyte!
          Anonymous usage reporting is currently enabled. For more information, please see https://docs.airbyte.com/telemetry
  INFO    Using Kubernetes provider:
            Provider: kind
            Kubeconfig: /home/ubuntu/.airbyte/abctl/abctl.kubeconfig
            Context: kind-airbyte-abctl
 SUCCESS  Found Docker installation: version 27.2.0
 WARNING  Airbyte does not appear to be installed locally
 ▄ Checking for existing Kubernetes cluster 'airbyte-abctl' 
fatchat commented 2 months ago

It looks like we need to move to Kubernetes before upgrading

fatchat commented 2 months ago

kapa.ai says we can upgrade before moving to kubernetes or to abctl. will attempt that now

https://raw.githubusercontent.com/airbytehq/airbyte/v0.58.0/run-ab-platform.sh

fatchat commented 2 months ago

Error:

Error response from daemon: create /tmp/airbyte_local: "/tmp/airbyte_local" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path

Airbyte started successfully but the logs were gone. Maybe I shouldn't have removed the volume...

Dalgo started and the dashboard opened

fatchat commented 2 months ago

12 Source Connectors to check

fatchat commented 2 months ago

42 APIs to check

{
            "workspaceId": "268fb5b3-ed64-4cdb-893e-38262eab61d8",
            "customerId": "69f30da9-9778-4c04-bdcc-c72cd3c8f7d7",
            "name": "org1",
            "slug": "org1",
            "initialSetupComplete": False,
            "displaySetupWizard": False,
            "anonymousDataCollection": False,
            "news": False,
            "securityUpdates": False,
            "notifications": [],
            "notificationSettings": {
                "sendOnSuccess": {"notificationType": []},
                "sendOnFailure": {"notificationType": ["customerio"]},
                "sendOnSyncDisabled": {"notificationType": ["customerio"]},
                "sendOnSyncDisabledWarning": {"notificationType": ["customerio"]},
                "sendOnConnectionUpdate": {"notificationType": ["customerio"]},
                "sendOnConnectionUpdateActionRequired": {
                    "notificationType": ["customerio"]
                },
                "sendOnBreakingChangeWarning": {"notificationType": ["customerio"]},
                "sendOnBreakingChangeSyncsDisabled": {
                    "notificationType": ["customerio"]
                },
            },
            "defaultGeography": "auto",
            "webhookConfigs": [],
            "organizationId": "00000000-0000-0000-0000-000000000000",
            "tombstone": False,
        }

create now takes a second required parameter: organizationId. this can be set to 00000000-0000-0000-0000-000000000000

{'sourceDefinitionId': 'a4617b39-3c14-44cd-a2eb-6e720f269235',
 'name': 'Public Apis',
 'dockerRepository': 'airbyte/source-public-apis',
 'dockerImageTag': '0.2.0',
 'documentationUrl': 'https://docs.airbyte.com/integrations/sources/public-apis',
 'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/source-public-apis/latest/icon.svg',
 'protocolVersion': '0.2.0',
 'custom': False,
 'supportLevel': 'community',
 'releaseStage': 'alpha',
 'sourceType': 'api',
 'maxSecondsBetweenMessages': 10800}
{'connections': [{'connectionId': '756bbf71-b72f-429c-908b-c1836dd5c298',
   'name': 'sync',
   'namespaceDefinition': 'customformat',
   'namespaceFormat': 'staging',
   'prefix': '',
   'sourceId': '05814f12-3b05-4541-97d2-7806fff39e94',
   'destinationId': 'f710b11a-2dba-4448-9a9f-0afdcf8e0e89',
   'operationIds': [],
   'syncCatalog': {'streams': [{'stream': {'name': 'Sheet2',
       'jsonSchema': {'type': 'object',
        '$schema': 'http://json-schema.org/draft-07/schema#',
        'properties': {'Iron': {'type': 'string'},
         'SNo.': {'type': 'string'},
         'Zone': {'type': 'string'},
         'State': {'type': 'string'},
         'Arsenic': {'type': 'string'},
         'Nitrate': {'type': 'string'},
         'Fluoride': {'type': 'string'},
         'Latitude': {'type': 'string'},
         'Multiple': {'type': 'string'},
         'Physical': {'type': 'string'},
         'salinity': {'type': 'string'},
         'Longitude': {'type': 'string'},
         'District Name': {'type': 'string'},
         'Bacteriological': {'type': 'string'}}},
       'supportedSyncModes': ['full_refresh'],
       'defaultCursorField': [],
       'sourceDefinedPrimaryKey': []},
      'config': {'syncMode': 'full_refresh',
       'cursorField': [],
       'destinationSyncMode': 'overwrite',
       'primaryKey': [],
       'aliasName': 'Sheet2',
       'selected': True,
       'fieldSelectionEnabled': False}}]},
   'scheduleType': 'manual',
   'status': 'active',
   'sourceCatalogId': 'cd539418-af6e-43f7-9d5c-02137efe5cb1',
   'geography': 'auto',
   'breakingChange': False,
   'notifySchemaChanges': False,
   'notifySchemaChangesByEmail': False,
   'nonBreakingChangesPreference': 'ignore',
   'created_at': 1725274314,
   'backfillPreference': 'disabled'}]}
{'connections': [{'connectionId': '756bbf71-b72f-429c-908b-c1836dd5c298',
   'name': 'sync',
   'scheduleType': 'manual',
   'status': 'active',
   'source': {'sourceId': '05814f12-3b05-4541-97d2-7806fff39e94',
    'name': 'source',
    'sourceDefinitionId': '71607ba1-c0ac-4799-8049-7f4b90dd50f7',
    'sourceName': 'Google Sheets',
    'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/source-google-sheets/latest/icon.svg'},
   'destination': {'destinationId': 'f710b11a-2dba-4448-9a9f-0afdcf8e0e89',
    'name': 'postgres-warehouse',
    'destinationDefinitionId': '25c5221d-dce2-4163-ade9-739ef790f503',
    'destinationName': 'Postgres',
    'icon': 'https://connectors.airbyte.com/files/metadata/airbyte/destination-postgres/latest/icon.svg'},
   'latestSyncJobCreatedAt': 1725849314,
   'latestSyncJobStatus': 'succeeded',
   'isSyncing': False,
   'schemaChange': 'no_change'}]}
fatchat commented 1 month ago

refreshing the schema of a google sheets source failed with

2024-09-10 17:52:38 platform > Preparing command: docker run --rm --init -i -w /data/97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4/0 --log-driver none --name source-google-sheets-discover-97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4-0-rfwvh --network host -v airbyte_workspace:/data -v oss_local_root:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-google-sheets:0.3.13 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.58.0 -e WORKER_JOB_ID=97e2a1de-e2c4-420a-ab78-5f67e0e8f4d4 airbyte/source-google-sheets:0.3.13 discover --config source_config.json
2024-09-10 17:52:38 platform > Reading messages from protocol version 0.2.0
2024-09-10 17:52:40 platform > Running discovery on sheet 1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ
2024-09-10 17:52:40 platform > Backing off get(...) for 0.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:41 platform > Backing off get(...) for 1.4s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:43 platform > Backing off get(...) for 1.3s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:44 platform > Backing off get(...) for 0.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:46 platform > Backing off get(...) for 10.3s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:52:56 platform > Backing off get(...) for 4.7s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&alt=json returned "The service is currently unavailable.". Details: "The service is currently unavailable.">)
2024-09-10 17:53:01 platform > Backing off get(...) for 48.9s (googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/1Pcq1yVSfam-h6nfEFVkUYHc3j-9PwPZVZU3fUZe9eoQ?includeGridData=false&a
Ishankoradia commented 1 month ago

Looks like google server issue.

Ishankoradia commented 1 month ago

Testing connectors

Google sheets (v 0.3.13)

Glific (v 0.1.2)

Commcare (v0.2.0)

Avni (v 0.2.0 (from our docker hub))

SurveyCTO (v 0.1.3 (from our docker hub))

Postgres (v 2.0.33 )

Kobotoolbox (v 0.2.0 (from our docker hub))

fatchat commented 1 month ago

Freshdesk

Frappe (Behavior Type stream)

Ishankoradia commented 1 month ago

Salesforce