airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.22k stars 4.14k forks source link

Airbyte CDK: using generic python source connector has JSON Schema validation error #14136

Closed yzislin closed 5 months ago

yzislin commented 2 years ago

Environment

Current Behavior

I am utilizing existing Github Schema files (for example, comments.json) that are part of Airbyte's Souce Github connector). I am testing with Generic Python Source connector which does not have any custom code. Check always returns true. Discover adds one stream with json_schema being loaded directly from the file (ie comments.json).

Sync job fails with com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved

Expected Behavior

As far as I understand, when using HTTP API Python Source connector, your code automatically resolves $ref to make sure it passes the JSON Validator. It seems there is no such thing available for Generic Python Source connector. After generating template via generator.sh, the example just says load json from json object. So i am loading it from disk. If I manually replace $ref with the actual structure, validator complains about missing id. I believe proper JSON schema requires $id. If I manually add that in, there is another error.

My code was working fine up to 0.37.1-alpha. I think after that you have introduced JSON Schema validator.

Logs

`--== 2022-06-21 20:8:52 ==-- 7 comments - Page 1 downloaded

2022-06-21 20:08:52 ERROR c.n.s.JsonMetaSchema(newValidator):345 - Error:
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        ... 31 more
2022-06-21 20:08:52 ERROR c.n.s.JsonMetaSchema(newValidator):345 - Error:
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        ... 20 more
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):63 - Airbyte message consumer: succeeded.
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.d.l.LocalJsonDestination$JsonConsumer(close):174 - finalizing consumer.
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.d.l.LocalJsonDestination$JsonConsumer(close):190 - File output: /local/test_data2/_airbyte_raw_comments.jsonl
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.b.IntegrationRunner(runInternal):153 - Completed integration: io.airbyte.integrations.destination.local_json.LocalJsonDestination
2022-06-21 20:08:53 ERROR i.a.w.g.DefaultReplicationWorker(run):180 - Sync worker failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:173) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:65) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:158) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:362) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        ... 1 more
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        ... 1 more
2022-06-21 20:08:53 INFO i.a.w.g.DefaultReplicationWorker(run):239 - sync summary: io.airbyte.config.ReplicationAttemptSummary@2587c4f5[status=failed,recordsSynced=0,bytesSynced=0,startTime=1655842126856,endTime=1655842133211,totalStats=io.airbyte.config.SyncStats@6a5a3a74[recordsEmitted=0,bytesEmitted=0,stateMessagesEmitted=0,recordsCommitted=0],streamStats=[]]
2022-06-21 20:08:53 INFO i.a.w.g.DefaultReplicationWorker(run):268 - Source did not output any state messages
2022-06-21 20:08:53 WARN i.a.w.g.DefaultReplicationWorker(run):276 - State capture: No new state, falling back on input state: io.airbyte.config.State@7ff799fe[state={}]
2022-06-21 20:08:53 INFO i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
2022-06-21 20:08:53 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):157 - sync summary: io.airbyte.config.StandardSyncOutput@b44e5b8[standardSyncSummary=io.airbyte.config.StandardSyncSummary@4e466ab4[status=failed,recordsSynced=0,bytesSynced=0,startTime=1655842126856,endTime=1655842133211,totalStats=io.airbyte.config.SyncStats@6a5a3a74[recordsEmitted=0,bytesEmitted=0,stateMessagesEmitted=0,recordsCommitted=0],streamStats=[]],normalizationSummary=<null>,state=io.airbyte.config.State@7ff799fe[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@2be9677a[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@7f420590[stream=io.airbyte.protocol.models.AirbyteStream@530756e5[name=comments,jsonSchema={"type":"object","$schema":"https://json-schema.org/draft/2020-12/schema","properties":{"id":{"type":["null","integer"]},"url":{"type":["null","string"]},"body":{"type":["null","string"]},"user":{"$ref":"user.json"},"node_id":{"type":["null","string"]},"user_id":{"type":["null","integer"]},"html_url":{"type":["null","string"]},"issue_url":{"type":["null","string"]},"created_at":{"type":["null","string"],"format":"date-time"},"repository":{"type":["string"]},"updated_at":{"type":["null","string"],"format":"date-time"},"author_association":{"type":["null","string"]}},"additionalProperties":false},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=false,defaultCursorField=[updated_at],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[updated_at],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}],failures=[io.airbyte.config.FailureReason@2e6995fb[failureOrigin=replication,failureType=<null>,internalMessage=java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved,externalMessage=Something went wrong during replication,metadata=io.airbyte.config.Metadata@4edbe5fb[additionalProperties={attemptNumber=0, jobId=51}],stacktrace=java.util.concurrent.CompletionException: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1807)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:362)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
        ... 3 more
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130)
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342)
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53)
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198)
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76)
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36)
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130)
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342)
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53)
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198)
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76)
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254)
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362)
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63)
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78)
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54)
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383)
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312)
        ... 4 more
,retryable=<null>,timestamp=1655842132854]]]
2022-06-21 20:08:53 INFO i.a.w.t.TemporalUtils(withBackgroundHeartbeat):237 - Stopping temporal heartbeating...
2022-06-21 20:08:53 INFO i.a.c.p.ConfigRepository(updateConnectionState):775 - Updating connection f8169f47-4869-4c24-915b-ec405056714a state: io.airbyte.config.State@2a274f33[state={}]
2022-06-21 20:08:53 INFO i.a.c.f.EnvVariableFeatureFlags(autoDisablesFailingConnections):14 - Auto Disable Failing Connections: false`

## Steps to Reproduce
1. Generate Generic Python Source connector via generator.sh
2. Add stream in Discover method with json schema loaded from comments.js
3. Generate a record with hardcoded data and yield AirbyteMessage event.
4. Create source, destination (LOCAL JSON) and sync it.
5. See code below

`import json
from datetime import datetime
from typing import Dict, Generator

from airbyte_cdk.logger import AirbyteLogger
from airbyte_cdk.models import (
    AirbyteCatalog,
    AirbyteConnectionStatus,
    AirbyteMessage,
    AirbyteRecordMessage,
    AirbyteStream,
    ConfiguredAirbyteCatalog,
    Status,
    Type,
)
from airbyte_cdk.sources import Source
import os

main_path = "/airbyte/integration_code/source_github_mine/"

class SourceGithubMine(Source):
    def check(self, logger: AirbyteLogger, config: json) -> AirbyteConnectionStatus:
        try:

            return AirbyteConnectionStatus(status=Status.SUCCEEDED)
        except Exception as e:
            return AirbyteConnectionStatus(status=Status.FAILED, message=f"An exception occurred: {str(e)}")

    def discover(self, logger: AirbyteLogger, config: json) -> AirbyteCatalog:
        streams = []

        stream_name = "comments"  # Example
        with open(os.path.join(main_path,"schemas","comments.json")) as f:
            json_schema = json.load(f)

        streams.append(AirbyteStream(name=stream_name, json_schema=json_schema))
        return AirbyteCatalog(streams=streams)

    def read(
        self, logger: AirbyteLogger, config: json, catalog: ConfiguredAirbyteCatalog, state: Dict[str, any]
    ) -> Generator[AirbyteMessage, None, None]:

        stream_name = "comments"  # Example
        data = {"url":"https://api.github.com/repos/curl/curl/issues/comments/785098704","html_url":"https://github.com/curl/curl/pull/6654#issuecomment-785098704","issue_url":"https://api.github.com/repos/curl/curl/issues/6654","id":785098704,"node_id":"MDEyOklzc3VlQ29tbWVudDc4NTA5ODcwNA==","user":{"login":"ghost","id":10137,"node_id":"MDQ6VXNlcjEwMTM3","avatar_url":"https://avatars.githubusercontent.com/u/10137?v=4","gravatar_id":"","url":"https://api.github.com/users/ghost","html_url":"https://github.com/ghost","followers_url":"https://api.github.com/users/ghost/followers","following_url":"https://api.github.com/users/ghost/following{/other_user}","gists_url":"https://api.github.com/users/ghost/gists{/gist_id}","starred_url":"https://api.github.com/users/ghost/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ghost/subscriptions","organizations_url":"https://api.github.com/users/ghost/orgs","repos_url":"https://api.github.com/users/ghost/repos","events_url":"https://api.github.com/users/ghost/events{/privacy}","received_events_url":"https://api.github.com/users/ghost/received_events","type":"User","site_admin":False},"created_at":"2021-02-24T14:05:29Z","updated_at":"2021-04-19T09:16:36Z","author_association":"NONE","body":"<img src=\"https://www.deepcode.ai/icons/green_check.svg\" width= \"50px\" align= \"left\"/> Congratulations :tada:. DeepCode [analyzed](https://www.deepcode.ai/app/gh/curl/curl/56a037cc0ad1b2a770d0c08d3d09dee1ce600f0f/curl/curl/bfde4230450e7756e42a43f866879037e4bba340/pr/_/%2F/code/?utm_source=gh_review&c=0&w=0&i=0&) your code in 2.831 seconds and we found no issues. Enjoy a moment of no bugs :sunny:.\n\n#### 👉 View analysis in [**DeepCode’s Dashboard**](https://www.deepcode.ai/app/gh/curl/curl/56a037cc0ad1b2a770d0c08d3d09dee1ce600f0f/curl/curl/bfde4230450e7756e42a43f866879037e4bba340/pr/_/%2F/code/?utm_source=gh_review&c=0&w=0&i=0&) | [_Configure the bot_](https://www.deepcode.ai/app/gh/?ownerconfig=curl)\n","reactions":{"url":"https://api.github.com/repos/curl/curl/issues/comments/785098704/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"performed_via_github_app":None,"repository":"curl/curl"}

        yield AirbyteMessage(
            type=Type.RECORD,
            record=AirbyteRecordMessage(stream=stream_name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000),
        )`

comments.json is available in default source_github connector.

yzislin commented 2 years ago

Just to add. If I manually resolve $ref reference by replacing that section with the contents of user.json, I get an error: Could not load validator id

alafanechere commented 2 years ago

Hi @yzislin I've triaged the issue so that it reaches our developer experiences team backlog. I'm looking for a connector we might have built with Generic Python Source connector and that you could use as an example. I'll get back to you.

yzislin commented 2 years ago

Hi @alafanechere ,

So if you just create a plain one with the code above and use schema with $ref, it will break on sync.

I have a workaround by replacing $ref with the actual json schema from another file (ie reference). It is not proper but it works.

alafanechere commented 2 years ago

Hey @yzislin, it's good to know you found a workaround. Do you mind sharing an example of a schema that is working now? Here's a small list of connectors we developed with this generator:

yzislin commented 2 years ago

Thanks for getting back. I checked these connectors. Yes, you are utilizing schema objects within the code and it is simple. I am utilizing schema files which I took from source-github connector. So pretty much the only issue is the $ref. My solution was to replace it with the actual file contents (ie shared/user.json). The problem after that was that It was still not a valid json schema file. I've used https://www.jsonschemavalidator.net/ to validate my schema and found other issues. After I have resolved them, your third party json schema validator passes the check and moves forward.

So the issue with $ref is that it supposed to reference another json file with the name from $id field in that file. Your github connector schema files do not have $id fields. It simply goes in and replaces $ref with the contents of the file in subfolder shared. Per json schema documentation, the proper way is to put $id in files that you will reference in $ref. The issue here is that there should be a base URI for these files and I am not sure what it should be in Airbyte and if it can be a file path instead of http URL.

I would suggest, that you explain in the documentation how we can reference base URI or have some method that just takes care of it. Then json schema $ref and $id objects can be used to have the schema files properly structured for ease of use.

Thanks.

PeterDMTFX commented 2 years ago

Not working for me, I'm getting the following in server logs when testing: [m i.a.p.j.e.LoggingJobErrorReportingClient(reportJobFailureReason):23 - Report Job Error -> workspaceId: 2991851e-875f-43b0-9f38-a81979e3a43f, dockerImage: airbyte/source-linkedin-pages:0.1.0, failureReason: io.airbyte.config.FailureReason@4da24c45[failureOrigin=source,failureType=system_error,internalMessage=Config validation error: '****' is not of type 'integer',externalMessage=Something went wrong in the connector. See the logs for more details.,metadata=io.airbyte.config.Metadata@6a3bbb1a[additionalProperties={attemptNumber=null, jobId=null, from_trace_message=true, connector_command=check}],stacktrace=Traceback (most recent call last): File "/airbyte/integration_code/main.py", line 13, in <module> launch(source, sys.argv[1:]) File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 123, in launch for message in source_entrypoint.run(parsed_args): File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 96, in run check_config_against_spec_or_exit(connector_config, source_spec) File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/utils/schema_helpers.py", line 160, in check_config_against_spec_or_exit raise Exception("Config validation error: " + validation_error.message) from None Exception: Config validation error: '****' is not of type 'integer' ,retryable=<null>,timestamp=1668599456405], metadata: {workspace_id=2991851e-875f-43b0-9f38-a81979e3a43f, airbyte_version=0.40.18, connector_definition_id=af54297c-e8f8-4d63-a00d-a94695acc9d3, failure_origin=source, connector_repository=airbyte/source-linkedin-pages, connector_release_stage=alpha, job_id=28f9f184-0698-4e4f-bee5-9ebe84225b35, workspace_url=airbyte-webapp-svc:80/workspaces/2991851e-875f-43b0-9f38-a81979e3a43f, failure_type=system_error, connector_command=check, connector_name=LinkedIn Pages, deployment_mode=OSS}

MaxSPG commented 12 months ago

For me it is also not working. I am developing a custom connector, but I am getting this error when I integrate it into the airbyte project locally and run:

SUB_BUILD=PLATFORM ./gradlew build

The Error: JsonSchemaValidatorTest > testResolveReferences() FAILED org.opentest4j.AssertionFailedError: expected: <[$.prop2: string found, boolean expected]> but was: <[$.prop2: string wurde gefunden, aber boolean erwartet]> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1142) at app//io.airbyte.validation.json.JsonSchemaValidatorTest.testResolveReferences(JsonSchemaValidatorTest.java:140)

Unfortunately this error message does not seem to give any hint about the file where the error occurs, or what JSON exactly did fail. (I am a Python dev - and am not used to Java Stack Traces).

Does somebody know how I can fix this? I wrote integration tests and unittests already, and they pass when I run them.

Thanks in advance! :)

octavia-squidington-iii commented 6 months ago

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.

octavia-squidington-iii commented 5 months ago

This issue was closed because it has been inactive for 20 days since being marked as stale.