DalgoT4D / maintenance

maintenance scripts for our ec2 machines
0 stars 0 forks source link

move airbyte to its own machine #27

Closed fatchat closed 2 weeks ago

fatchat commented 1 month ago

The main challenge is to copy sync logs over so that they're accessible through the new one

fatchat commented 1 month ago

docker volume inspect airbyte_workspace will provide the mount point e.g.:

"Mountpoint": "/var/lib/docker/volumes/airbyte_workspace/_data",

which contains

  1. /var/lib/docker/volumes/airbyte_workspace/_data/{connection_id}/{attempt_number}/logs.log which look like logs for schema discovery. this folder may not exist for every connection 2./var/lib/docker/volumes/airbyte_workspace/_data/{job_id}/{attempt_numer}/logs.log which are sync logs

the folders /var/lib/docker/volumes/airbyte_workspace/_data/{job_id}/{attempt_numer}/ also contain

  1. destination_catalog.json
  2. destination_config.json
  3. source_catalog.json
  4. source_config.json

all these directories are owned by root

fatchat commented 1 month ago

ref: https://docs.airbyte.com/operator-guides/browsing-output-logs#exploring-local-logs

fatchat commented 1 month ago
fatchat commented 1 month ago

then we copied the airbyte volumes over

on staging:

sudo rsync -avz -e "ssh -i dalgo-staging-1.pem" /var/lib/docker/volumes/airbyte_data/ ubuntu@<dalgo-staging-1-IP>:/home/ubuntu/docker-volumes/airbyte_data/
sudo rsync -avz -e "ssh -i dalgo-staging-1.pem" /var/lib/docker/volumes/airbyte_db/ ubuntu@<dalgo-staging-1-IP>:/home/ubuntu/docker-volumes/airbyte_db/
sudo rsync -avz -e "ssh -i dalgo-staging-1.pem" /var/lib/docker/volumes/airbyte_workspace/ ubuntu@<dalgo-staging-1-IP>:/home/ubuntu/docker-volumes/airbyte_workspace/

on dalgo-staging-1:

sudo cp -R /home/ubuntu/docker-volumes/airbyte_data /var/lib/docker/volumes/airbyte_data
sudo cp -R /home/ubuntu/docker-volumes/airbyte_db /var/lib/docker/volumes/airbyte_db
sudo cp -R /home/ubuntu/docker-volumes/airbyte_workspace /var/lib/docker/volumes/airbyte_workspace
fatchat commented 1 month ago

created security group allow-8000-from-dalgo-staging to allow inbound traffic on port 8000 from the dalgo-staging security group. added this new SG to dalgo-staging-

started airbyte on dalgo-staging-1

set AIRBYTE_SERVER_HOST in DDP_backend's .env to the private IP address of the dalgo-staging-1 machine

started dalgo on staging

opened staging.dalgo.in

was able to see connections, sources, source details

was able to see sync history logs

fatchat commented 1 month ago

wrote script updateairbyteserverblocks.py in this repo (https://github.com/DalgoT4D/maintenance/pull/29) to update the airbyte server url in prefect's airbyte server blocks

fatchat commented 1 month ago

the sync command reaches airbyte but the syncs are failing

Activity with activityType='RunWithJobOutput' failed: 'Activity task failed'. scheduledEventId=12, startedEventId=13, activityId=70810b9e-5cb2-3c84-a0fd-7efebc165f22, identity='1@5d502b74a6cd', retryState=RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED

refresh-schema fails with "discovered is null"

fatchat commented 1 month ago

checking connectivity of existing sources also fails

Internal message: Activity with activityType='RunWithJobOutput' failed: 'Activity task failed'. scheduledEventId=12, startedEventId=13, activityId=88262bdf-09c1-3c84-9c52-af03b04d7c49, identity='1@5d502b74a6cd', retryState=RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED
Failure origin: source
fatchat commented 1 month ago

same for existing destinations

Internal message: Activity with activityType='RunWithJobOutput' failed: 'Activity task failed'. scheduledEventId=12, startedEventId=13, activityId=e70d85dd-648f-3d9e-b254-48c638be2ec2, identity='1@5d502b74a6cd', retryState=RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED
Failure origin: destination
fatchat commented 1 month ago

this stack trace might be useful:

Something went wrong fetching schema change (catalog) for connection c84d80fd-7187-4bb3-bfc2-7369325d786c: 

{
  "message": "Internal Server Error: Cannot invoke \"io.airbyte.api.model.generated.AirbyteCatalog.getStreams()\" because \"discovered\" is null",
  "exceptionClassName": "java.lang.NullPointerException",
  "exceptionStack": [
    "java.lang.NullPointerException: Cannot invoke \"io.airbyte.api.model.generated.AirbyteCatalog.getStreams()\" because \"discovered\" is null",
    "  at io.airbyte.commons.server.handlers.WebBackendConnectionsHandler.updateSchemaWithRefreshedDiscoveredCatalog(WebBackendConnectionsHandler.java:462)",
    "  at io.airbyte.commons.server.handlers.WebBackendConnectionsHandler.webBackendGetConnection(WebBackendConnectionsHandler.java:394)",
    "  at io.airbyte.server.apis.WebBackendApiController.lambda$webBackendGetConnection$2(WebBackendApiController.java:96)",
    "  at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28)",
    "  at io.airbyte.server.apis.WebBackendApiController.webBackendGetConnection(WebBackendApiController.java:94)",
    "  at io.airbyte.server.apis.$WebBackendApiController$Definition$Exec.dispatch(Unknown Source)",
    "  at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371)",
    "  at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594)",
    "  at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303)",
    "  at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111)",
    "  at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103)",
    "  at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659)",
    "  at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49)",
    "  at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)",
    "  at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)",
    "  at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62)",
    "  at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)",
    "  at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)",
    "  at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53)",
    "  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)",
    "  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)",
    "  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)",
    "  at java.base/java.lang.Thread.run(Thread.java:1583)"
  ]
}
fatchat commented 1 month ago

the docker-compose.yml was missing the lines

      - SECRET_STORE_GCP_PROJECT_ID=${SECRET_STORE_GCP_PROJECT_ID}
      - SECRET_STORE_GCP_CREDENTIALS=${SECRET_STORE_GCP_CREDENTIALS}

added them, now syncs are running

fatchat commented 1 month ago

re-tested connectivity to existing sources

so our connectors failed and airbyte's didn't

fatchat commented 1 month ago

fixed connectors for avni and papersurvey by rebuilding them... the connectors on dockerhub were for x86 and this new machine is arm64

fatchat commented 4 weeks ago
fatchat commented 4 weeks ago

and verified:

fatchat commented 4 weeks ago

two clients had not whitelisted the ip of the new machine so we had to roll back

going again now

fatchat commented 3 weeks ago

everything looking good so far