Closed lolrenx closed 2 years ago
Update : I've identified the 'commits' table as the troublemaker. If I omit it from the tables to sync, the sync job succeeds.
attached log of failing reset on 'commits' table only.
Hello @lolrenx
After looking at the first log file I can tell that you don't have permissions for team
stream because the error says requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/orgs/tricky-dev/teams?per_page=100
.
Could you please tell me which value you put into repository
field for github source?
I also don't have persimmions for tricky-dev
organization and I also got 403 Forbidden error:
This is from github documentation for team
stream:
github source is setup as follow
I've since then elevated the permission for the github access token, but the 'commits' table is still giving trouble.
I can sync the source if I omit the 'commits' table
I've since then elevated the permission for the github access token
So just to clarify, after you published first log file you updated your permissions for team
stream and right now you don't have an issue with that stream, right?
Right now you have issues with commits
stream? Could you please select only commits
stream, run sync and send logs here? I am asking this because I need to see an actual error in logs so I can help you :)
Yes, that's correct :) only the commits
table fails to reset / sync now. here are a couple log files
@sherifnada from above logs it looks like there is an issue in java code:
[35mdestination[0m - 2021-11-26 16:58:27 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-26 16:58:27 [33mWARN[m i.a.i.d.s.a.JsonToAvroSchemaConverter(getAvroSchema):105 - {} - Schema name contains illegal character(s) and is standardized: parents.items -> parents_items
[35mdestination[0m - 2021-11-26 16:58:27 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-26 16:58:27 [33mWARN[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):58 - {} - Airbyte message consumer: failed.
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: author
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$Names.put(Schema.java:1542)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:805)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:967)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1234)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:995)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:979)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1234)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:995)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:979)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema.toString(Schema.java:419)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at org.apache.avro.Schema.toString(Schema.java:391)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.destination.s3.writer.ProductionWriterFactory.create(ProductionWriterFactory.java:42)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.destination.s3.S3Consumer.startTracked(S3Consumer.java:53)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.start(FailureTrackingAirbyteMessageConsumer.java:34)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:142)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:128)
[35mdestination[0m - 2021-11-26 16:58:27 ERROR () LineGobbler(voidCall):82 - at io.airbyte.integrations.destination.s3.S3Destination.main(S3Destination.java:29)
Would that be related to my setup ? the errors seem to reference avro
but the destination is set to parquet
format
The commits
table syncs successfully when using CSV
or JSONL
destination format though, but fails with parquet
and avro
formats
@lolrenx could you please do request to commit
stream manually using postman?
Here is the link in which you need to replace <YOUR_REPO>
with repository from tricky-dev
organization:
https://api.github.com/repos/tricky-dev/<YOUR_REPO>/commits?per_page=100&page=1
After doing that could you please confirm that you are getting some data for that stream? Like in my case I have 1 record in response:
request looks alright, same API token as airbyte
@lolrenx in logs there is following line:
Syncing `commits` stream isn't available for repository `tricky-dev/gameOfLifeVaccination`, it seems like this repository is empty.
Strange, this repo does not appear in my organisation. Is there a way to manually ignore it?
Could airbyte handle gracefully the absence of that table in a repo instead of failing the sync?
Strange, this repo does not appear in my organisation. Is there a way to manually ignore it?
Right now the only way to ignore a specific repository is to omit using it in repository
field. In your case you need to write in repository
field following:
tricky-dev/repo1 tricky-dev/repo2 tricky-dev/repo3 tricky-dev/repo4
repos setup as per above, destination format parquet
, trying to reset and sync the commits
table is still failing
So with csv
and jsonl
formats commits
stream is working and with parquet
and avro
formats don't, right?
So with
csv
andjsonl
formatscommits
stream is working and withparquet
andavro
formats don't, right?
that's exactly what is happening, correct.
@lolrenx is there a way for you to use csv
and jsonl
formats while we are fixing this issue?
@sherifnada could you please take a look at this comment and especially at the first line in error example. It's strange behaviour and I'm not sure why we are getting it.
I believe that there is no error in GitHub source because it works with csv
and jsonl
formats.
I also believe this ticket requires java expertise.
Here are logs for parquet
format that have error:
logs-87-2.txt
logs-88-2.txt
logs-100-0.txt
Here are logs for avro
format that have error:
logs-94-0.txt
Here are logs with csv
format that works:
logs-93-0.txt
No worries, thanks for taking care of it! I'll be watching this space !
thanks for looking into it @Zirochkaa , moving to java team
Enviroment
Current Behavior
I've tried to write to CSV / Parquet / JSONL, writing fails Source and destination pass the connection test
Expected Behavior
Sync should run successfully
Logs
LOG
`` 2021-11-24 16:15:39 INFO () WorkerRun(call):49 - Executing worker wrapper. Airbyte version: 0.32.6-alpha 2021-11-24 16:15:39 INFO () TemporalAttemptExecution(get):116 - Executing worker wrapper. Airbyte version: 0.32.6-alpha 2021-11-24 16:15:39 WARN () Databases(createPostgresDatabaseWithRetry):41 - Waiting for database to become available... 2021-11-24 16:15:39 INFO () JobsDatabaseInstance(lambda$static$2):25 - Testing if jobs database is ready... 2021-11-24 16:15:39 INFO () Databases(createPostgresDatabaseWithRetry):58 - Database available! 2021-11-24 16:15:39 INFO () DefaultReplicationWorker(run):99 - start sync worker. job id: 15 attempt id: 1 2021-11-24 16:15:39 INFO () DefaultReplicationWorker(run):108 - configured sync modes: {null.commit_comments=full_refresh - append, null.assignees=full_refresh - append, null.issue_events=full_refresh - append, null.issues=full_refresh - append, null.teams=full_refresh - append, null.issue_milestones=full_refresh - append, null.reviews=full_refresh - append, null.issue_comment_reactions=full_refresh - append, null.issue_reactions=full_refresh - append, null.commit_comment_reactions=full_refresh - append, null.tags=full_refresh - append, null.pull_request_comment_reactions=full_refresh - append, null.pull_requests=full_refresh - append, null.comments=full_refresh - append, null.commits=full_refresh - append, null.issue_labels=full_refresh - append, null.organizations=full_refresh - append, null.branches=full_refresh - append, null.review_comments=full_refresh - append, null.pull_request_stats=full_refresh - append, null.releases=full_refresh - append, null.projects=full_refresh - append, null.events=full_refresh - append, null.users=full_refresh - append, null.collaborators=full_refresh - append, null.repositories=full_refresh - append, null.stargazers=full_refresh - append} 2021-11-24 16:15:39 INFO () DefaultAirbyteDestination(start):64 - Running destination... 2021-11-24 16:15:39 INFO () LineGobbler(voidCall):82 - Checking if airbyte/destination-s3:0.1.13 exists... 2021-11-24 16:15:39 INFO () LineGobbler(voidCall):82 - airbyte/destination-s3:0.1.13 was found locally. 2021-11-24 16:15:39 INFO () DockerProcessFactory(create):127 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/15/1 --network host --log-driver none airbyte/destination-s3:0.1.13 write --config destination_config.json --catalog destination_catalog.json 2021-11-24 16:15:39 INFO () LineGobbler(voidCall):82 - Checking if airbyte/source-github:0.2.6 exists... 2021-11-24 16:15:39 INFO () LineGobbler(voidCall):82 - airbyte/source-github:0.2.6 was found locally. 2021-11-24 16:15:39 INFO () DockerProcessFactory(create):127 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/15/1 --network host --log-driver none airbyte/source-github:0.2.6 read --config source_config.json --catalog source_catalog.json --state input_state.json 2021-11-24 16:15:39 INFO () DefaultReplicationWorker(lambda$getDestinationOutputRunnable$3):243 - Destination output thread started. 2021-11-24 16:15:39 INFO () DefaultReplicationWorker(run):136 - Waiting for source thread to join. 2021-11-24 16:15:39 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):207 - Replication thread started. [34msource[0m - 2021-11-24 16:15:40 INFO () DefaultAirbyteStreamFactory(internalLog):97 - Starting syncing SourceGithub [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [32mINFO[m i.a.i.b.IntegrationRunner(run):76 - {} - Running integration: io.airbyte.integrations.destination.s3.S3Destination [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):118 - {} - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json} [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [32mINFO[m i.a.i.b.IntegrationRunner(run):80 - {} - Command: WRITE [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [32mINFO[m i.a.i.b.IntegrationRunner(run):81 - {} - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'} [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - {} - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword [35mdestination[0m - 2021-11-24 16:15:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:41 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - {} - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword [35mdestination[0m - 2021-11-24 16:15:42 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:42 [32mINFO[m i.a.i.d.s.S3FormatConfigs(getS3FormatConfig):22 - {} - S3 format config: {"flattening":"Root level flattening","format_type":"CSV","part_size_mb":5} [35mdestination[0m - 2021-11-24 16:15:42 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-11-24 16:15:42 [32mINFO[m i.a.i.d.s.c.S3CsvWriter(Steps to Reproduce
Are you willing to submit a PR?
probably underqualified, sorry