Apicurio / apicurio-registry

An API/Schema registry - stores APIs and Schemas.
https://www.apicur.io/registry/
Apache License 2.0
598 stars 265 forks source link

Error Timeout trying to lock table with apicurio-registry-mem and Confluent serializer #2699

Closed loicmathieu closed 1 year ago

loicmathieu commented 2 years ago

Hi, I'm using the docker container quay.io/apicurio/apicurio-registry-mem:2.2.5.Final in my integration tests with Confluent serializers. It sometimes, but not always, fails with a Timeout trying to lock table exception.

Complete stacktrace :

 org.apache.kafka.common.errors.SerializationException: Error registering Avro schema{"type":"record","name":"Trace","namespace":"com.decathlon.vct.vcstream.avro.trace","doc":"All terminal components (sender, listener, anomaly) will send this trace object to the tracing system with the exchange information","fields":[{"name":"component","type":[{"type":"string","avro.java.string":"String"}],"doc":"Component that initiated the Trace","default":"missing-component"},{"name":"type","type":{"type":"string","avro.java.string":"String"},"doc":"Data type of the message","default":"missing-trace-type"},{"name":"direction","type":{"type":"enum","name":"Direction","symbols":["INBOUND","OUTBOUND"]},"doc":"Inbound or Outbound"},{"name":"startTime","type":["null","long"],"default":null},{"name":"uuid","type":{"type":"string","avro.java.string":"String"},"doc":"UUID of the message","default":"missing-trace-uuid"},{"name":"dispatchCriteria","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"Dispatch criteria of the message","default":null},{"name":"reason","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"Deprecated - Remove it break FORWARD Compatibility","default":null},{"name":"duration","type":["null","long"],"default":null},{"name":"endTime","type":["null","long"],"default":null},{"name":"batchId","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"Batch id of the message","default":null},{"name":"status","type":{"type":"enum","name":"TraceStatus","symbols":["OK","ERROR"]},"default":"ERROR"}],"version":5}
        at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.toKafkaException(AbstractKafkaSchemaSerDe.java:259)
        at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:168)
        at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:61)
        at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
        at io.smallrye.reactive.messaging.kafka.fault.SerializerWrapper.lambda$serialize$1(SerializerWrapper.java:56)
        at io.smallrye.reactive.messaging.kafka.fault.SerializerWrapper.wrapSerialize(SerializerWrapper.java:81)
        at io.smallrye.reactive.messaging.kafka.fault.SerializerWrapper.serialize(SerializerWrapper.java:56)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:945)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:905)
        at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaProducer.lambda$send$5(ReactiveKafkaProducer.java:127)
        at io.smallrye.context.impl.wrappers.SlowContextualConsumer.accept(SlowContextualConsumer.java:21)
        at io.smallrye.mutiny.operators.uni.builders.UniCreateWithEmitter.subscribe(UniCreateWithEmitter.java:22)
        at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
        at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: io.apicurio.registry.storage.impl.sql.jdb.RuntimeSqlException: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL statement:
INSERT INTO sequences (tenantId, name, value) VALUES (?, ?, ?) [50200-197]; error code: 0
        at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:301)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:371)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:548)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:536)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:494)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:274)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:381)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:354)
        at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:125)
        ... 15 more
carlesarnal commented 2 years ago

To be able to answer I would like to know more information, specifically about the lock mode you're using, the lock timeout etc, especially if you changed them to non-default values.

Thanks!

loicmathieu commented 2 years ago

@carlesarnal I just use the schema registry docker container with only confluent serializer/deserializer and the error pops up from time to time inside my test cases (which use testcontainers to start the container). I didn't specify anything about any lock.

carlesarnal commented 2 years ago

As you probably know, the default lock time is relatively low (1s). Do you mind trying to increase it and see what happens?

Thanks!

loicmathieu commented 2 years ago

@carlesarnal I have no idea there was such settings and also have no idea how to increase it via the docker image.

carlesarnal commented 2 years ago

By default Apicurio Registry sets the jdbc url using the env var REGISTRY_DATASOURCE_URL. The default value when the in-memory version is used is jdbc:h2:mem:registry_db, you can try setting that value to jdbc:h2:mem:registry_db;DEFAULT_LOCK_TIMEOUT=5000.

loicmathieu commented 2 years ago

Thanks @carlesarnal I'll try it tomorrow and let you know

andreaTP commented 2 years ago

Hi @loicmathieu , do you mind sharing how to reproduce the issue on our side? Meanwhile, I find this comment possibly relevant: https://stackoverflow.com/a/4162685 We are using H2 version 1.4.199 so you can try to append also ;MVCC=true to the DATASOURCE_URL as @carlesarnal mentioned.

loicmathieu commented 2 years ago

@carlesarnal @andreaTP with both setting I was able to reproduce the issue. I'll try to provide a reproducer but as it's via a complex Quarkus test using automatic container (devservices) this may not be easy to reproduce.

donare commented 2 years ago

I have the same issue as you, @loicmathieu. I am using Quarkus with Confluent Avro serializers and the apicurio dev-services container. I only had this problem when using Quarkus' built-in Kafka integration that relies on Smallrye Reactive Messaging. When directly using a KafkaProducer which uses the Apicurio container to write to Kafka, I don't have any issues.

Are you using Smallrye to write to Kafka?

loicmathieu commented 2 years ago

@donare yes, the issue arise when using Quarkus Smallrye Reactive Messaging Kafka extension.

EricWittmann commented 1 year ago

One bit of news that is possibly unrelated. We have improved the in-memory version of registry that is used as part of dev mode. We are now creating a native version. This should improve startup time drastically, but I'm not sure if it will have any impact on this issue.

carlesarnal commented 1 year ago

Agreed, we're not only creating a native version now, but also the h2 version has been significantly updated (from 1.4.x to 2.x) so worth trying. @loicmathieu do you mind trying on your side?

Thanks!

loicmathieu commented 1 year ago

@carlesarnal we didn't have this issue since some weeks now, we upgrade Quarkus and Testcontainers so it may fix some issue. I'm closing this one now, I'll repoen it if it stills occurs. Thanks for your feedbacks