airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.66k stars 4.02k forks source link

normalization failing from http source #2031

Closed jrhizor closed 2 years ago

jrhizor commented 3 years ago

Postgres normalization failing after a successful sync from the HTTP endpoint.

2021-02-10 08:08:24 INFO (/tmp/workspace/165/2) WorkerRun(call):58 - Executing worker wrapper...
2021-02-10 08:08:24 INFO (/tmp/workspace/165/2) DefaultSyncWorker(run):82 - configured sync modes: {data=full_refresh}
2021-02-10 08:08:24 INFO (/tmp/workspace/165/2) DefaultAirbyteDestination(start):67 - Running target...
2021-02-10 08:08:24 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Checking if airbyte/destination-postgres:0.1.14 exists...
2021-02-10 08:08:25 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - airbyte/destination-postgres:0.1.14 was found locally.
2021-02-10 08:08:25 DEBUG (/tmp/workspace/165/2) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/165/2 --network host airbyte/destination-postgres:0.1.14 write --config target_config.json --catalog catalog.json
2021-02-10 08:08:25 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Checking if airbyte/source-http-request:0.1.0 exists...
2021-02-10 08:08:26 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - airbyte/source-http-request:0.1.0 was found locally.
2021-02-10 08:08:26 DEBUG (/tmp/workspace/165/2) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/165/2 --network host airbyte/source-http-request:0.1.0 read --config tap_config.json --catalog catalog.json
2021-02-10 08:08:37 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:37 [32mINFO[m i.a.i.d.p.PostgresDestination(main):68 - {} - starting destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-02-10 08:08:37 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:37 [32mINFO[m i.a.i.b.IntegrationRunner(run):78 - {} - Running integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-02-10 08:08:37 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:37 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {catalog=catalog.json, write=null, config=target_config.json}
2021-02-10 08:08:37 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:37 [32mINFO[m i.a.i.b.IntegrationRunner(run):82 - {} - Command: WRITE
2021-02-10 08:08:37 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:37 [32mINFO[m i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=WRITE, configPath='target_config.json', catalogPath='catalog.json', statePath='null'}
2021-02-10 08:08:38 DEBUG (/tmp/workspace/165/2) DefaultAirbyteSource(close):109 - Closing tap process
2021-02-10 08:08:38 DEBUG (/tmp/workspace/165/2) DefaultAirbyteDestination(close):102 - Closing target process
2021-02-10 08:08:40 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:40 [32mINFO[m i.a.i.b.FailureTrackingConsumer(close):64 - {} - hasFailed: false.
2021-02-10 08:08:40 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:40 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(close):158 - {} - executing on success close procedure.
2021-02-10 08:08:40 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:40 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(writeStreamsWithNRecords):188 - {} - Writing stream data. Max batch size: 10000, Actual batch size: 1, Remaining buffered records: 0
2021-02-10 08:08:40 ERROR (/tmp/workspace/165/2) LineGobbler(voidCall):69 - WARNING: An illegal reflective access operation has occurred
2021-02-10 08:08:40 ERROR (/tmp/workspace/165/2) LineGobbler(voidCall):69 - WARNING: Illegal reflective access by com.leansoft.bigqueue.page.MappedPageImpl$Cleaner (file:/airbyte/lib/leansoft-bigqueue-0.7.3.jar) to method java.nio.DirectByteBuffer.cleaner()

papercupsAPP  9 hours ago
2021-02-10 08:08:40 ERROR (/tmp/workspace/165/2) LineGobbler(voidCall):69 - WARNING: Please consider reporting this to the maintainers of com.leansoft.bigqueue.page.MappedPageImpl$Cleaner
2021-02-10 08:08:40 ERROR (/tmp/workspace/165/2) LineGobbler(voidCall):69 - WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
2021-02-10 08:08:40 ERROR (/tmp/workspace/165/2) LineGobbler(voidCall):69 - WARNING: All illegal access operations will be denied in a future release
2021-02-10 08:08:41 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:41 [32mINFO[m i.a.i.b.IntegrationRunner(run):117 - {} - Completed integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-02-10 08:08:41 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 2021-02-10 08:08:41 [32mINFO[m i.a.i.d.p.PostgresDestination(main):70 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-02-10 08:08:45 INFO (/tmp/workspace/165/2) DefaultSyncWorker(run):110 - Running normalization.
2021-02-10 08:08:45 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Checking if airbyte/normalization:0.1.10 exists...
2021-02-10 08:08:45 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - airbyte/normalization:0.1.10 was found locally.
2021-02-10 08:08:45 DEBUG (/tmp/workspace/165/2) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/165/2/normalize --network host airbyte/normalization:0.1.10 run --integration-type postgres --config target_config.json --catalog catalog.json
2021-02-10 08:08:50 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Namespace(config='target_config.json', integration_type=<DestinationType.postgres: 'postgres'>, out='/data/165/2/normalize')
2021-02-10 08:08:50 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - transform_postgres
2021-02-10 08:08:51 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Processing catalog.json...
2021-02-10 08:08:51 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Generating data.sql in /data/165/2/normalize/models/generated/
2021-02-10 08:08:53 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Running with dbt=0.18.1
2021-02-10 08:08:56 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Found 1 model, 0 tests, 0 snapshots, 0 analyses, 308 macros, 0 operations, 0 seed files, 1 source
2021-02-10 08:08:56 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 -
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 | Concurrency: 32 threads (target='prod')
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 |
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 | 1 of 1 START table model covid.data.......................................................................... [RUN]
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 | 1 of 1 ERROR creating table model covid.data................................................................. [[31mERROR[0m in 0.20s]
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 |
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - 08:08:58 | Finished running 1 table model in 2.19s.
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 -
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - [31mCompleted with 1 error and 0 warnings:[0m
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 -
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - [33mDatabase Error in model data (models/generated/data.sql)[0m
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - syntax error at or near "from"

papercupsAPP  9 hours ago
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - LINE 16: from "test".covid._airbyte_raw_data
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - ^
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - compiled SQL at ../build/run/airbyte_utils/models/generated/data.sql
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 -
2021-02-10 08:08:58 INFO (/tmp/workspace/165/2) LineGobbler(voidCall):69 - Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
2021-02-10 08:09:03 DEBUG (/tmp/workspace/165/2) DefaultNormalizationRunner(close):97 - Closing tap process
2021-02-10 08:09:03 ERROR (/tmp/workspace/165/2) DefaultSyncWorker(run):117 - Normalization Failed.
io.airbyte.workers.WorkerException: Normalization Failed.
at io.airbyte.workers.DefaultSyncWorker.run(DefaultSyncWorker.java:114) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.DefaultSyncWorker.run(DefaultSyncWorker.java:48) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.wrappers.OutputConvertingWorker.run(OutputConvertingWorker.java:44) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.wrappers.JobOutputSyncWorker.run(JobOutputSyncWorker.java:32) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.lambda$new$0(WorkerRun.java:53) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.call(WorkerRun.java:61) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.call(WorkerRun.java:42) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.commons.concurrency.LifecycledCallable.execute(LifecycledCallable.java:114) [io.airbyte-airbyte-commons-0.13.1-alpha.jar:?]
at io.airbyte.commons.concurrency.LifecycledCallable.call(LifecycledCallable.java:98) [io.airbyte-airbyte-commons-0.13.1-alpha.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Suppressed: io.airbyte.workers.WorkerException: Tap process wasn't successful
at io.airbyte.workers.normalization.DefaultNormalizationRunner.close(DefaultNormalizationRunner.java:100) ~[io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.DefaultSyncWorker.run(DefaultSyncWorker.java:109) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.DefaultSyncWorker.run(DefaultSyncWorker.java:48) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.wrappers.OutputConvertingWorker.run(OutputConvertingWorker.java:44) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.workers.wrappers.JobOutputSyncWorker.run(JobOutputSyncWorker.java:32) [io.airbyte-airbyte-workers-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.lambda$new$0(WorkerRun.java:53) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.call(WorkerRun.java:61) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.scheduler.WorkerRun.call(WorkerRun.java:42) [io.airbyte-airbyte-scheduler-0.13.1-alpha.jar:?]
at io.airbyte.commons.concurrency.LifecycledCallable.execute(LifecycledCallable.java:114) [io.airbyte-airbyte-commons-0.13.1-alpha.jar:?]
at io.airbyte.commons.concurrency.LifecycledCallable.call(LifecycledCallable.java:98) [io.airbyte-airbyte-commons-0.13.1-alpha.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]

┆Issue is synchronized with this Asana task by Unito

crunchtoinfo commented 3 years ago

Hi everyone. I came across this potential issue and I am responding here to Jared's questions from Papercups.

Here are the details of the endpoint: URL: https://data.egov.bg/api/getResourceData body: {"resource_uri" : "e59f95dd-afde-43af-83c8-ea2916badd19"} headers: {"Accept-Charset":"utf-8"} (did a lot of trial and error here - was not working with {"Content-Type": "application/json"}) http_method: POST

The source data can be seen at this URL: https://data.egov.bg/data/resourceView/e59f95dd-afde-43af-83c8-ea2916badd19

The response contains a UTF-8 entities - maybe this is the probelem? The response can be generated by executing in ubuntu:

curl -X POST https://data.egov.bg/api/getResourceData -H "Content-Type: application/json" -d "{\"resource_uri\" : \"e59f95dd-afde-43af-83c8-ea2916badd19\"}"

Any help is appreciated.

jrhizor commented 3 years ago

Thanks @crunchtoinfo, this is enough information to reproduce on our end.

@cgardens: There are two problems here. First, I think the "data" field hack is causing a problem. The raw table created is called just _airbyte_raw_data with no actual value. I think this is why normalization is failing.

Secondly, the actual call should be able to support {"Content-Type": "application/json"} as it works with curl. It definitely does look like an encoding issue.

marcosmarxm commented 2 years ago

Closing this because HTTP Source is deprecated