COPRS / rs-issues

This repository contains all the issues of the COPRS project (Scrum tickets, ivv bugs, epics ...)
2 stars 2 forks source link

[BUG] INGESTION - org.apache.kafka.common.config.ConfigException: Invalid value 3600000 #853

Closed Woljtek closed 1 year ago

Woljtek commented 1 year ago

Environment:

Current Behavior: During the S2 production day, we observed that the SGS ingestion was blocked to the following errors:

{"header":{"type":"LOG","timestamp":"2023-03-07T07:59:40.963058Z","level":"ERROR","line":200,"file":"BindingService.java","thread":"scheduling-1"},"message":{"content":"Failed to create consumer binding; retrying in 30 seconds org.springframework.cloud.stream.binder.BinderException: Exception thrown while starting consumer: \n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:480) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:92) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractBinder.bindConsumer(AbstractBinder.java:143) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binding.BindingService.lambda$rescheduleConsumerBinding$1(BindingService.java:205) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.18.jar!/:5.3.18]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]\n\tat java.lang.Thread.run(Thread.java:829) [?:?]\nCaused by: org.apache.kafka.common.config.ConfigException: Invalid value 3600000,spring.cloud.stream.default.producer.configuration.max.request.size=2000000 for configuration max.poll.interval.ms: Not a number of type INT\n\tat org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:726) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:474) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:467) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:108) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.ConsumerConfig.<init>(ConsumerConfig.java:542) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createRawConsumer(DefaultKafkaConsumerFactory.java:416) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:384) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumerWithAdjustedProperties(DefaultKafkaConsumerFactory.java:360) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:327) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumer(DefaultKafkaConsumerFactory.java:304) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.<init>(KafkaMessageListenerContainer.java:767) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer.doStart(KafkaMessageListenerContainer.java:351) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:466) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.ConcurrentMessageListenerContainer.doStart(ConcurrentMessageListenerContainer.java:209) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:466) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.doStart(KafkaMessageDrivenChannelAdapter.java:326) ~[spring-integration-kafka-5.5.10.jar!/:5.5.10]\n\tat org.springframework.integration.endpoint.AbstractEndpoint.start(AbstractEndpoint.java:157) ~[spring-integration-core-5.5.10.jar!/:5.5.10]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:433) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\t... 10 more\n"},"custom":{"logger_string":"org.springframework.cloud.stream.binding.BindingService"}}
{"header":{"type":"LOG","timestamp":"2023-03-07T08:00:11.007713Z","level":"ERROR","line":545,"file":"KafkaTopicProvisioner.java","thread":"scheduling-1"},"message":{"content":"Failed to obtain partition information org.apache.kafka.common.config.ConfigException: Invalid value 3600000,spring.cloud.stream.default.producer.configuration.max.request.size=2000000 for configuration max.poll.interval.ms: Not a number of type INT\n\tat org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:726) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:474) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:467) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:108) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.ConsumerConfig.<init>(ConsumerConfig.java:542) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createRawConsumer(DefaultKafkaConsumerFactory.java:416) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:384) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:324) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumer(DefaultKafkaConsumerFactory.java:297) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:69) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:56) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:45) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder.lambda$getPartitionInfo$5(KafkaMessageChannelBinder.java:1162) ~[spring-cloud-stream-binder-kafka-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.kafka.provisioning.KafkaTopicProvisioner.lambda$getPartitionsForTopic$6(KafkaTopicProvisioner.java:535) ~[spring-cloud-stream-binder-kafka-core-3.2.2.jar!/:3.2.2]\n\tat org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329) ~[spring-retry-1.3.2.jar!/:?]\n\tat org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:209) ~[spring-retry-1.3.2.jar!/:?]\n\tat org.springframework.cloud.stream.binder.kafka.provisioning.KafkaTopicProvisioner.getPartitionsForTopic(KafkaTopicProvisioner.java:530) ~[spring-cloud-stream-binder-kafka-core-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder.getPartitionInfo(KafkaMessageChannelBinder.java:1159) ~[spring-cloud-stream-binder-kafka-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder.processTopic(KafkaMessageChannelBinder.java:908) ~[spring-cloud-stream-binder-kafka-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder.createConsumerEndpoint(KafkaMessageChannelBinder.java:631) ~[spring-cloud-stream-binder-kafka-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder.createConsumerEndpoint(KafkaMessageChannelBinder.java:163) ~[spring-cloud-stream-binder-kafka-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:426) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:92) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractBinder.bindConsumer(AbstractBinder.java:143) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binding.BindingService.lambda$rescheduleConsumerBinding$1(BindingService.java:205) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.18.jar!/:5.3.18]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]\n\tat java.lang.Thread.run(Thread.java:829) [?:?]\n"},"custom":{"logger_string":"org.springframework.cloud.stream.binder.kafka.provisioning.KafkaTopicProvisioner"}}

This couple of errors is confirmed by the SCDF configuration: image

We restarted the stream. The errors disappear. But the ',' error stream.parameters.properties still exists.

Expected Behavior: The ingestion shall work

Steps To Reproduce: We cannot reproduce these errors. It seems there is unidentified factor.

Whenever possible, first analysis of the root cause The couple of errors on stream.parameters.properties may come from non ascii character:

$ grep --color='auto' -rIPn "[^\x00-\x7F]" rs-core-ingestion-s2-sgs_Executables/stream-parameters.properties 
34:# Allow heavy message from ingestion-trigger
cat -A rs-core-ingestion-s2-sgs_Executables/stream-parameters.properties | grep "Allow heavy message"
#M-BM- Allow heavy message from ingestion-trigger$

Note that there are multiple non-ascii characters in rs-config. Here below, a list of suspected issues:

My guts feeling is that this behavior comes from copy/past from CONFLUENCE.

Test execution artefacts (i.e. logs, screenshots…) Tip: You can attach images or log files by dragging & dropping, selecting or pasting them.

Bug Generic Definition of Ready (DoR)

Bug Generic Definition of Done (DoD)

LAQU156 commented 1 year ago

IVV_CCB_2023_w10 : Moved into "Under Analysis", Priority minor, @Woljtek add check on ASCII on the procedure and check yaml files on rs-config

Woljtek commented 1 year ago

Actions done:

Woljtek commented 1 year ago

New occurrence this week-end on Ingestion S2-SGS chain Invalid value 3600000,spring.cloud.stream.default.producer.configuration.max.request.size=2000000 for configuration max.poll.interval.ms: Not a number of type INT on the ingestion-worker:

Logs

A log:

{"header":{"type":"LOG","timestamp":"2023-03-13T08:01:48.371184Z","level":"ERROR","line":200,"file":"BindingService.java","thread":"scheduling-1"},"message":{"content":"Failed to create consumer binding; retrying in 30 seconds org.springframework.cloud.stream.binder.BinderException: Exception thrown while starting consumer: \n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:480) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:92) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binder.AbstractBinder.bindConsumer(AbstractBinder.java:143) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.cloud.stream.binding.BindingService.lambda$rescheduleConsumerBinding$1(BindingService.java:205) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\tat org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.18.jar!/:5.3.18]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]\n\tat java.lang.Thread.run(Thread.java:829) [?:?]\nCaused by: org.apache.kafka.common.config.ConfigException: Invalid value 3600000,spring.cloud.stream.default.producer.configuration.max.request.size=2000000 for configuration max.poll.interval.ms: Not a number of type INT\n\tat org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:726) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:474) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:467) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:108) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.ConsumerConfig.<init>(ConsumerConfig.java:542) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631) ~[kafka-clients-2.5.0.jar!/:?]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createRawConsumer(DefaultKafkaConsumerFactory.java:416) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:384) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumerWithAdjustedProperties(DefaultKafkaConsumerFactory.java:360) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:327) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumer(DefaultKafkaConsumerFactory.java:304) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.<init>(KafkaMessageListenerContainer.java:767) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer.doStart(KafkaMessageListenerContainer.java:351) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:466) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.ConcurrentMessageListenerContainer.doStart(ConcurrentMessageListenerContainer.java:209) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.kafka.listener.AbstractMessageListenerContainer.start(AbstractMessageListenerContainer.java:466) ~[spring-kafka-2.8.4.jar!/:2.8.4]\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.doStart(KafkaMessageDrivenChannelAdapter.java:326) ~[spring-integration-kafka-5.5.10.jar!/:5.5.10]\n\tat org.springframework.integration.endpoint.AbstractEndpoint.start(AbstractEndpoint.java:157) ~[spring-integration-core-5.5.10.jar!/:5.5.10]\n\tat org.springframework.cloud.stream.binder.AbstractMessageChannelBinder.doBindConsumer(AbstractMessageChannelBinder.java:433) ~[spring-cloud-stream-3.2.2.jar!/:3.2.2]\n\t... 10 more\n"},"custom":{"logger_string":"org.springframework.cloud.stream.binding.BindingService"}}

Full Logs: ingestion-s2-sgs-part1-ingestion-s2-sgs-worker-v12.txt

SCDF configuration: image

Impacts on production

It seems this error blocks the S2 production: image Moreover, we observed a lag before the worker: image

Root cause

Set Long Processing parameters

app.ingestion-s2-sgs-worker.spring.cloud.stream.kafka.bindings.input.consumer.configuration.max.poll.records=1 app.ingestion-s2-sgs-worker.spring.cloud.stream.kafka.bindings.input.consumer.configuration.max.poll.interval.ms=3600000

Limit and request

checked performed see lines 149 to 165.



Maybe the deployment was done before the merge of the PR  [Rs-853 remove non ascii char ](https://github.com/COPRS/rs-config/pull/72). @omiazek-ads Could you give us the hash of the commit you used to deploy ?
LAQU156 commented 1 year ago

IVV_CCB_2023_w11 : Action @suberti-ads : Add new information on this issue, will be placed into "Accepted Werum" then, stays into "Under Analysis"

suberti-ads commented 1 year ago

Hereafter last change pushed (WA): New

app.ingestion-s2-sgs-worker.spring.cloud.stream.kafka.binding.input.consumer.configuration.max.poll.interval.ms=3600000

Old

app.ingestion-s2-sgs-worker.spring.cloud.stream.kafka.bindings.input.consumer.configuration.max.poll.interval.ms=3600000

So with our last change, option " long-running application"should not beeing well configure: https://github.com/COPRS/production-common/blob/release/1.6.0/processing-common/doc/long-running.md

After checking all ingestion chain were deployed with this typo.

LAQU156 commented 1 year ago

IVV_CCB_2023_w12 : Need to see if IVV reproduces this issue, Priority minor (workaround), Stays under analysis, action on @nleconte-csgroup side

nleconte-csgroup commented 1 year ago

The problem was reproduced on IVV. It comes from the next property, which is simply incorrectly configured: https://github.com/COPRS/rs-config/blob/3f6ed6acbc0c744cab0aeec056e89ba206ae1e16/apps/rs-core/rs-core-ingestion-s2-sgs_Executables/stream-parameters.properties#L37

Please replace this : spring.cloud.stream.default.producer.configuration.max.request.size=2000000

with : app.ingestion-s2-sgs-trigger.spring.cloud.stream.default.producer.configuration.max.request.size=2000000

LAQU156 commented 1 year ago

IVV_CCB_2023_w13 : Ok on IVV side, Moved into "Accepted OPS"

pcuq-ads commented 1 year ago

SYS_CCB_w17 : Configuration issue. It is fixed now.

suberti-ads commented 1 year ago

Following last @nleconte-csgroup comment, we had a mistake on our configuration. Issue has been fixed.