airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.19k stars 4.14k forks source link

Basic Normalization doesn't Normalize #859

Closed cgardens closed 4 years ago

cgardens commented 4 years ago

Expected Behavior

Current Behavior

Steps to Reproduce

  1. Configure Hubspot with api key
  2. Configure postgres with basic normalization turned on
  3. Launch the job.

Logs

airbyte-scheduler | 2020-11-09 16:58:55 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 2020-11-09 16:58:55 INFO i.a.i.b.IntegrationRunner(run):84 - {} - Integration config: IntegrationConfig{command=WRITE, configPath='target_config.json', catalogPath='catalog.json', statePath='null'}
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schemas
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for subscription_changes
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for email_events
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for forms
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for workflows
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for owners
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for campaigns
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for contact_lists
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for contacts
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - GET https://api.hubapi.com/properties/v1/contacts/properties?hapikey=e51db33e-b843-480a-ad3c-7ca6beba83e2
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.28363656997680664, "tags": {"endpoint": "properties", "http_status_code": 200, "status": "succeeded"}}
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - GET https://api.hubapi.com/companies/v2/properties?hapikey=e51db33e-b843-480a-ad3c-7ca6beba83e2
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.1718921661376953, "tags": {"endpoint": "companies", "http_status_code": 200, "status": "succeeded"}}
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for companies
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - GET https://api.hubapi.com/companies/v2/properties?hapikey=e51db33e-b843-480a-ad3c-7ca6beba83e2
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.15334033966064453, "tags": {"endpoint": "companies", "http_status_code": 200, "status": "succeeded"}}
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for deals
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - GET https://api.hubapi.com/properties/v1/deals/properties?hapikey=e51db33e-b843-480a-ad3c-7ca6beba83e2
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.1946721076965332, "tags": {"endpoint": "properties", "http_status_code": 200, "status": "succeeded"}}
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for deal_pipelines
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for engagements
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Loading schema for contacts_by_company
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Starting sync. Will sync these streams: ['owners']
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Syncing owners
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - sync_owners from 2017-01-01T00:00:00Z
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - GET https://api.hubapi.com/owners/v2/owners?hapikey=e51db33e-b843-480a-ad3c-7ca6beba83e2
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):110 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.1785726547241211, "tags": {"endpoint": "owners", "http_status_code": 200, "status": "succeeded"}}
airbyte-scheduler | 2020-11-09 16:58:56 ERROR i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):108 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING Removed 1 paths during transforms:
airbyte-scheduler |
airbyte-scheduler | 2020-11-09 16:58:56 ERROR i.a.w.p.a.DefaultAirbyteStreamFactory(internalLog):108 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -   activeSalesforceId
airbyte-scheduler |
airbyte-scheduler | 2020-11-09 16:58:56 INFO i.a.w.DefaultSyncWorker(run):91 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Running normalization.
airbyte-scheduler | 2020-11-09 16:58:56 DEBUG i.a.w.p.DockerProcessBuilderFactory(create):103 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Preparing command: docker run --rm -i -v /tmp/dev_root/workspace:/data -v /tmp/airbyte_local:/local -w /data/27/0/normalize --network host airbyte/normalization:dev run --integration-type postgres --config target_config.json --catalog catalog.json
airbyte-scheduler | 2020-11-09 16:58:57 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Namespace(config='target_config.json', integration_type=<DestinationType.postgres: 'postgres'>, out='/data/27/0/normalize')
airbyte-scheduler | 2020-11-09 16:58:57 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - transform_postgres
airbyte-scheduler | 2020-11-09 16:58:57 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Processing catalog.json...
airbyte-scheduler | 2020-11-09 16:58:57 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -   Generating owners.sql in /data/27/0/normalize/models/generated/
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobRetrier(run):53 - {} - Running job-retrier...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobRetrier(run):67 - {} - Completed job-retrier...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobScheduler(run):76 - {} - Running job-scheduler...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobScheduler(run):80 - {} - Completed job-scheduler...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobSubmitter(run):67 - {} - Running job-submitter...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.s.JobSubmitter(run):76 - {} - Completed job-submitter...
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Running with dbt=0.18.1
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 2020-11-09 16:58:58 INFO i.a.i.b.FailureTrackingConsumer(close):50 - {} - hasFailed: false.
airbyte-scheduler | 2020-11-09 16:58:58 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 2020-11-09 16:58:58 ERROR i.a.i.d.p.PostgresDestination$RecordConsumer(close):257 - {} - executing on success close procedure.
airbyte-scheduler | 2020-11-09 16:58:59 ERROR i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING: An illegal reflective access operation has occurred
airbyte-scheduler | 2020-11-09 16:58:59 ERROR i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING: Illegal reflective access by com.leansoft.bigqueue.page.MappedPageImpl$Cleaner (file:/airbyte/lib/leansoft-bigqueue-0.7.3.jar) to method java.nio.DirectByteBuffer.cleaner()
airbyte-scheduler | 2020-11-09 16:58:59 ERROR i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING: Please consider reporting this to the maintainers of com.leansoft.bigqueue.page.MappedPageImpl$Cleaner
airbyte-scheduler | 2020-11-09 16:58:59 ERROR i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
airbyte-scheduler | 2020-11-09 16:58:59 ERROR i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - WARNING: All illegal access operations will be denied in a future release
airbyte-scheduler | 2020-11-09 16:58:59 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 2020-11-09 16:58:59 INFO i.a.i.b.IntegrationRunner(run):120 - {} - Completed integration: io.airbyte.integrations.destination.postgres.PostgresDestination
airbyte-scheduler | 2020-11-09 16:58:59 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 2020-11-09 16:58:59 INFO i.a.i.d.p.PostgresDestination(main):338 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
airbyte-scheduler | 2020-11-09 16:59:01 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Installing https://github.com/fishtown-analytics/dbt-utils.git@0.6.2
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobRetrier(run):53 - {} - Running job-retrier...
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobRetrier(run):67 - {} - Completed job-retrier...
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobScheduler(run):76 - {} - Running job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobScheduler(run):80 - {} - Completed job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobSubmitter(run):67 - {} - Running job-submitter...
airbyte-scheduler | 2020-11-09 16:59:03 INFO i.a.s.JobSubmitter(run):76 - {} - Completed job-submitter...
airbyte-scheduler | 2020-11-09 16:59:05 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -   Installed from revision 0.6.2
airbyte-scheduler | 2020-11-09 16:59:05 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -
airbyte-scheduler | 2020-11-09 16:59:06 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Running with dbt=0.18.1
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobRetrier(run):53 - {} - Running job-retrier...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobRetrier(run):67 - {} - Completed job-retrier...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobScheduler(run):76 - {} - Running job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobScheduler(run):80 - {} - Completed job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobSubmitter(run):67 - {} - Running job-submitter...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.s.JobSubmitter(run):76 - {} - Completed job-submitter...
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Found 1 model, 0 tests, 0 snapshots, 0 analyses, 300 macros, 0 operations, 0 seed files, 1 source
airbyte-scheduler | 2020-11-09 16:59:08 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -
airbyte-scheduler | 2020-11-09 16:59:11 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:11 | Concurrency: 32 threads (target='prod')
airbyte-scheduler | 2020-11-09 16:59:11 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:11 |
airbyte-scheduler | 2020-11-09 16:59:11 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:11 | 1 of 1 START table model public_NORMALIZED.owners............................................................ [RUN]
airbyte-scheduler | 2020-11-09 16:59:12 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:12 | 1 of 1 OK created table model public_NORMALIZED.owners....................................................... [SELECT 1 in 1.53s]
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobRetrier(run):53 - {} - Running job-retrier...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobRetrier(run):67 - {} - Completed job-retrier...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobScheduler(run):76 - {} - Running job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobScheduler(run):80 - {} - Completed job-scheduler...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobSubmitter(run):67 - {} - Running job-submitter...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.JobSubmitter(run):76 - {} - Completed job-submitter...
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:13 |
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - 16:59:13 | Finished running 1 table model in 5.06s.
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Completed successfully
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} -
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.c.i.LineGobbler(voidCall):69 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
airbyte-scheduler | 2020-11-09 16:59:13 DEBUG i.a.w.n.DefaultNormalizationRunner(close):96 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Closing tap process
airbyte-scheduler | 2020-11-09 16:59:13 DEBUG i.a.w.p.a.DefaultAirbyteSource(close):107 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Closing tap process
airbyte-scheduler | 2020-11-09 16:59:13 DEBUG i.a.w.p.a.DefaultAirbyteDestination(close):102 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Closing target process
airbyte-scheduler | 2020-11-09 16:59:13 INFO i.a.s.p.DefaultSchedulerPersistence(updateStatus):199 - {job_id=27, job_log_filename=logs.log, job_root=/tmp/workspace/27/0} - Setting job status to COMPLETED for job 27

tables in db after the sync ompletes

d342ls464c9cel=> \d
            List of relations
 Schema |  Name  | Type  |     Owner
--------+--------+-------+----------------
 public | owners | table | fojairvqgmsjoz
(2 rows)
ChristopheDuong commented 4 years ago

DBT's log says that it created table model public_NORMALIZED.owners

So it's in a different schema until #845

cgardens commented 4 years ago

okay. fwiw, i'm not seeing any table created in the other schema. but i'll just wait until that other issue is completed.

d342ls464c9cel=> SET search_path TO public_NORMALIZED;
SET
d342ls464c9cel=> \d
Did not find any relations.
d342ls464c9cel=> select * from owners;
ERROR:  relation "owners" does not exist
LINE 1: select * from owners;
                      ^
d342ls464c9cel=>
cgardens commented 4 years ago

this seems to be fixed since we changed how we are handling table naming.