Closed yzislin closed 5 months ago
Just to add. If I manually resolve $ref reference by replacing that section with the contents of user.json, I get an error:
Could not load validator id
Hi @yzislin I've triaged the issue so that it reaches our developer experiences team backlog. I'm looking for a connector we might have built with Generic Python Source connector
and that you could use as an example. I'll get back to you.
Hi @alafanechere ,
So if you just create a plain one with the code above and use schema with $ref, it will break on sync.
I have a workaround by replacing $ref with the actual json schema from another file (ie reference). It is not proper but it works.
Hey @yzislin, it's good to know you found a workaround. Do you mind sharing an example of a schema that is working now? Here's a small list of connectors we developed with this generator:
source-firebolt
source-apify-dataset
source-azure-table
All these connector are generating schema on execution with Python.Thanks for getting back. I checked these connectors. Yes, you are utilizing schema objects within the code and it is simple. I am utilizing schema files which I took from source-github connector. So pretty much the only issue is the $ref. My solution was to replace it with the actual file contents (ie shared/user.json). The problem after that was that It was still not a valid json schema file. I've used https://www.jsonschemavalidator.net/ to validate my schema and found other issues. After I have resolved them, your third party json schema validator passes the check and moves forward.
So the issue with $ref is that it supposed to reference another json file with the name from $id field in that file. Your github connector schema files do not have $id fields. It simply goes in and replaces $ref with the contents of the file in subfolder shared. Per json schema documentation, the proper way is to put $id in files that you will reference in $ref. The issue here is that there should be a base URI for these files and I am not sure what it should be in Airbyte and if it can be a file path instead of http URL.
I would suggest, that you explain in the documentation how we can reference base URI or have some method that just takes care of it. Then json schema $ref and $id objects can be used to have the schema files properly structured for ease of use.
Thanks.
Not working for me, I'm getting the following in server logs when testing:
[m i.a.p.j.e.LoggingJobErrorReportingClient(reportJobFailureReason):23 - Report Job Error -> workspaceId: 2991851e-875f-43b0-9f38-a81979e3a43f, dockerImage: airbyte/source-linkedin-pages:0.1.0, failureReason: io.airbyte.config.FailureReason@4da24c45[failureOrigin=source,failureType=system_error,internalMessage=Config validation error: '****' is not of type 'integer',externalMessage=Something went wrong in the connector. See the logs for more details.,metadata=io.airbyte.config.Metadata@6a3bbb1a[additionalProperties={attemptNumber=null, jobId=null, from_trace_message=true, connector_command=check}],stacktrace=Traceback (most recent call last): File "/airbyte/integration_code/main.py", line 13, in <module> launch(source, sys.argv[1:]) File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 123, in launch for message in source_entrypoint.run(parsed_args): File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 96, in run check_config_against_spec_or_exit(connector_config, source_spec) File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/utils/schema_helpers.py", line 160, in check_config_against_spec_or_exit raise Exception("Config validation error: " + validation_error.message) from None Exception: Config validation error: '****' is not of type 'integer' ,retryable=<null>,timestamp=1668599456405], metadata: {workspace_id=2991851e-875f-43b0-9f38-a81979e3a43f, airbyte_version=0.40.18, connector_definition_id=af54297c-e8f8-4d63-a00d-a94695acc9d3, failure_origin=source, connector_repository=airbyte/source-linkedin-pages, connector_release_stage=alpha, job_id=28f9f184-0698-4e4f-bee5-9ebe84225b35, workspace_url=airbyte-webapp-svc:80/workspaces/2991851e-875f-43b0-9f38-a81979e3a43f, failure_type=system_error, connector_command=check, connector_name=LinkedIn Pages, deployment_mode=OSS}
For me it is also not working. I am developing a custom connector, but I am getting this error when I integrate it into the airbyte project locally and run:
SUB_BUILD=PLATFORM ./gradlew build
The Error: JsonSchemaValidatorTest > testResolveReferences() FAILED org.opentest4j.AssertionFailedError: expected: <[$.prop2: string found, boolean expected]> but was: <[$.prop2: string wurde gefunden, aber boolean erwartet]> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1142) at app//io.airbyte.validation.json.JsonSchemaValidatorTest.testResolveReferences(JsonSchemaValidatorTest.java:140)
Unfortunately this error message does not seem to give any hint about the file where the error occurs, or what JSON exactly did fail. (I am a Python dev - and am not used to Java Stack Traces).
Does somebody know how I can fix this? I wrote integration tests and unittests already, and they pass when I run them.
Thanks in advance! :)
At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.
This issue was closed because it has been inactive for 20 days since being marked as stale.
Environment
Current Behavior
I am utilizing existing Github Schema files (for example, comments.json) that are part of Airbyte's Souce Github connector). I am testing with Generic Python Source connector which does not have any custom code. Check always returns true. Discover adds one stream with json_schema being loaded directly from the file (ie comments.json).
Sync job fails with
com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
Expected Behavior
As far as I understand, when using HTTP API Python Source connector, your code automatically resolves $ref to make sure it passes the JSON Validator. It seems there is no such thing available for Generic Python Source connector. After generating template via generator.sh, the example just says load json from json object. So i am loading it from disk. If I manually replace $ref with the actual structure, validator complains about missing id. I believe proper JSON schema requires $id. If I manually add that in, there is another error.
My code was working fine up to 0.37.1-alpha. I think after that you have introduced JSON Schema validator.
Logs
comments.json is available in default source_github connector.