airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

[destination-bigquery] JSON String value length exceeds the maximum allowed #44093

Open evandro-morini opened 4 weeks ago

evandro-morini commented 4 weeks ago

Connector Name

destination-bigquery

Connector Version

2.8.6

What step the error happened?

During the sync

Relevant information

Receiving this error during the sync between a Postgres source and Bigquery destination (a table with JSON columns):

java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: String value length (52494336) exceeds the maximum allowed (52428800, from StreamReadConstraints.getMaxStringLength()) (through reference chain: io.airbyte.cdk.integrations.destination.async.model.PartialAirbyteMessage["serialized"])

Any chances on updating the default value or adding an env var to setup this?

Thanks!

Relevant log output

at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.kt:209) [airbyte-cdk-core-0.41.4.jar:?]
        at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.kt:116) [airbyte-cdk-core-0.41.4.jar:?]
        at io.airbyte.integrations.destination.bigquery.BigQueryDestinationKt.main(BigQueryDestination.kt:564) [io.airbyte.airbyte-integrations.connectors-destination-bigquery.jar:?]
        Suppressed: io.airbyte.commons.exceptions.TransientErrorException: Some streams were unsuccessful due to a source error. See logs for details.
                at io.airbyte.cdk.integrations.destination.async.AsyncStreamConsumer.close(AsyncStreamConsumer.kt:210) ~[airbyte-cdk-core-0.41.4.jar:?]
                at kotlin.jdk7.AutoCloseableKt.closeFinally(AutoCloseableJVM.kt:49) ~[kotlin-stdlib-1.9.23.jar:1.9.23-release-779]
                at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.kt:209) [airbyte-cdk-core-0.41.4.jar:?]
                at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.kt:116) [airbyte-cdk-core-0.41.4.jar:?]
                at io.airbyte.integrations.destination.bigquery.BigQueryDestinationKt.main(BigQueryDestination.kt:564) [io.airbyte.airbyte-integrations.connectors-destination-bigquery.jar:?]
Caused by: com.fasterxml.jackson.databind.JsonMappingException: String value length (52494336) exceeds the maximum allowed (52428800, from `StreamReadConstraints.getMaxStringLength()`) (through reference chain: io.airbyte.cdk.integrations.destination.async.model.PartialAirbyteMessage["serialized"])
        at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1937) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:312) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4899) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3846) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3814) ~[jackson-databind-2.16.1.jar:2.16.1]
        at io.airbyte.commons.json.Jsons.deserialize(Jsons.kt:68) ~[airbyte-cdk-dependencies-0.41.4.jar:?]
        ... 15 more
Caused by: com.fasterxml.jackson.core.exc.StreamConstraintsException: String value length (52494336) exceeds the maximum allowed (52428800, from `StreamReadConstraints.getMaxStringLength()`)
        at com.fasterxml.jackson.core.StreamReadConstraints._constructException(StreamReadConstraints.java:549) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:484) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2241) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2207) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:327) ~[jackson-core-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:42) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:11) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:310) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4899) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3846) ~[jackson-databind-2.16.1.jar:2.16.1]
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3814) ~[jackson-databind-2.16.1.jar:2.16.1]
        at io.airbyte.commons.json.Jsons.deserialize(Jsons.kt:68) ~[airbyte-cdk-dependencies-0.41.4.jar:?]
        ... 15 more

Contribute

evandro-morini commented 4 weeks ago

After digging a little bit I think the source of the error is here: https://github.com/airbytehq/airbyte/blob/731ae133694628376090089710ec230ae25d6351/airbyte-cdk/java/airbyte-cdk/dependencies/src/main/kotlin/io/airbyte/commons/json/Jsons.kt#L34

marcosmarxm commented 5 days ago

@airbytehq/destinations can someone take a look into this issue?

evantahler commented 5 days ago

Updated the title to reflect JSON limit.

@evandro-morini - can you share more about what your records look like? This means you have a single row which is larger than 50mb...

evandro-morini commented 4 days ago

Hello @evantahler we're dealing here with ML documents extraction in a Postgres table (around 130GiB) with multiple JSON fields. We also noticed that airbyte is saving these JSON fields as STRING inside Bigquery (but that's another story and I really don't know if it can cause an impact in the row size).