[BUG] [OPS] Metadata Catalog Extraction 'Error on downloading from OBS' for S3 AUX

suberti-ads commented 1 year ago

Environment:

Platform: OPS Orange Cloud
Configuration: v1.1 - Infrastructure v0.10 - Processing v.1.4.0 SCDF description: https://processing.platform.ops-csc.com/dashboard/#/streams/list/metadata-catalog-part1

Current Behavior: For each AUX S3 OPER_AUX_GNSSRD_POD data, we observed that in te metadata extraction-worker:

36 loops of 11 downloaded retries. Example of one loop:

`{"header":{"type":"REPORT","timestamp":"2022-10-07T07:54:43.525000Z","level":"INFO","mission":"S3","workflow":"NOMINAL","rs_chain_name":"METADATA","rs_chain_version":"1.4.0-rc1"},"message":{"content":"Start downloading from OBS"},"task":{"uid":"469a5925-25c7-419e-84f5-e924b31cce6d","name":"ObsRead","event":"BEGIN","input":{"bucket_string":"ops-rs-aux","filename_string":"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941"},"child_of_task":"3fc8b4be-5476-4cde-92a1-393fcda45b44"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:43.540324Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (1/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:44.051674Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (2/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:44.567859Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (3/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:45.079920Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (4/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:45.593604Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (5/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:46.103611Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (6/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:46.614725Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (7/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:47.125558Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (8/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:47.637354Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (9/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:48.148694Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (10/11), retrying in 500ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"REPORT","timestamp":"2022-10-07T07:54:48.700000Z","level":"ERROR","mission":"S3","workflow":"NOMINAL","rs_chain_name":"METADATA","rs_chain_version":"1.4.0-rc1"},"message":{"content":"Error on downloading from OBS: esa.s1pdgs.cpoc.obs_sdk.s3.S3SdkClientException: {'bucket': \"ops-rs-aux\", 'key': \"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941\", 'msg': \"Download in /data/local-catalog/s3_aux/ fails: Error: Number of retries has exceeded while performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ after 11 attempts: esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServiceException: {'bucket': \"ops-rs-aux\", 'key': \"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\", 'msg': \"Directory creation fails for /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\" }\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:245)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.downloadObjectsWithPrefix(S3ObsServices.java:212)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsClient.downloadObject(S3ObsClient.java:172)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.downloadObjects(AbstractObsClient.java:110)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.download(AbstractObsClient.java:245)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.lambda$downloadMetadataFileToLocalFolder$0(AbstractMetadataExtractor.java:59)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.downloadMetadataFileToLocalFolder(AbstractMetadataExtractor.java:58)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.S3AuxMetadataExtractor.extract(S3AuxMetadataExtractor.java:33)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.handleMessage(ExtractionService.java:122)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:91)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:43)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunctionAndEnrichResultIfNecessary(SimpleFunctionRegistry.java:897)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunction(SimpleFunctionRegistry.java:853)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:708)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:551)\n\tat org.springframework.cloud.stream.function.PartitionAwareFunctionWrapper.apply(PartitionAwareFunctionWrapper.java:84)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionWrapper.apply(FunctionConfiguration.java:754)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1.handleMessageInternal(FunctionConfiguration.java:586)\n\tat org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)\n\tat org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)\n\tat org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)\n\tat org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)\n\tat org.springframework.integration.endpoint.MessageProducerSupport.sendMessage(MessageProducerSupport.java:216)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.sendMessageIfAny(KafkaMessageDrivenChannelAdapter.java:397)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.access$300(KafkaMessageDrivenChannelAdapter.java:83)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:454)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:428)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.lambda$onMessage$0(RetryingMessageListenerAdapter.java:125)\n\tat org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)\n\tat org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:255)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:119)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:42)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeOnMessage(KafkaMessageListenerContainer.java:2629)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeOnMessage(KafkaMessageListenerContainer.java:2609)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:2536)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:2427)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:2305)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1979)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeIfHaveRecords(KafkaMessageListenerContainer.java:1364)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1355)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1247)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.nio.file.FileAlreadyExistsException: /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)\n\tat java.base/java.nio.file.Files.newByteChannel(Files.java:371)\n\tat java.base/java.nio.file.Files.createFile(Files.java:648)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:243)\n\t... 52 more\n\" }\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.downloadObjectsWithPrefix(S3ObsServices.java:276)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsClient.downloadObject(S3ObsClient.java:172)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.downloadObjects(AbstractObsClient.java:110)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.download(AbstractObsClient.java:245)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.lambda$downloadMetadataFileToLocalFolder$0(AbstractMetadataExtractor.java:59)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.downloadMetadataFileToLocalFolder(AbstractMetadataExtractor.java:58)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.S3AuxMetadataExtractor.extract(S3AuxMetadataExtractor.java:33)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.handleMessage(ExtractionService.java:122)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:91)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:43)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunctionAndEnrichResultIfNecessary(SimpleFunctionRegistry.java:897)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunction(SimpleFunctionRegistry.java:853)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:708)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:551)\n\tat org.springframework.cloud.stream.function.PartitionAwareFunctionWrapper.apply(PartitionAwareFunctionWrapper.java:84)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionWrapper.apply(FunctionConfiguration.java:754)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1.handleMessageInternal(FunctionConfiguration.java:586)\n\tat org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)\n\tat org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)\n\tat org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)\n\tat org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)\n\tat org.springframework.integration.endpoint.MessageProducerSupport.sendMessage(MessageProducerSupport.java:216)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.sendMessageIfAny(KafkaMessageDrivenChannelAdapter.java:397)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.access$300(KafkaMessageDrivenChannelAdapter.java:83)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:454)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:428)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.lambda$onMessage$0(RetryingMessageListenerAdapter.java:125)\n\tat org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)\n\tat org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:255)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:119)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:42)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeOnMessage(KafkaMessageListenerContainer.java:2629)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeOnMessage(KafkaMessageListenerContainer.java:2609)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:2536)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:2427)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:2305)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1979)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeIfHaveRecords(KafkaMessageListenerContainer.java:1364)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1355)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1247)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.RuntimeException: Error: Number of retries has exceeded while performing download objects ops-rs-aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ after 11 attempts: esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServiceException: {'bucket': \"ops-rs-aux\", 'key': \"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\", 'msg': \"Directory creation fails for /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\" }\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:245)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.downloadObjectsWithPrefix(S3ObsServices.java:212)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsClient.downloadObject(S3ObsClient.java:172)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.downloadObjects(AbstractObsClient.java:110)\n\tat esa.s1pdgs.cpoc.obs_sdk.AbstractObsClient.download(AbstractObsClient.java:245)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.lambda$downloadMetadataFileToLocalFolder$0(AbstractMetadataExtractor.java:59)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.AbstractMetadataExtractor.downloadMetadataFileToLocalFolder(AbstractMetadataExtractor.java:58)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.extraction.S3AuxMetadataExtractor.extract(S3AuxMetadataExtractor.java:33)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.handleMessage(ExtractionService.java:122)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:91)\n\tat esa.s1pdgs.cpoc.metadata.extraction.service.ExtractionService.apply(ExtractionService.java:43)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunctionAndEnrichResultIfNecessary(SimpleFunctionRegistry.java:897)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeFunction(SimpleFunctionRegistry.java:853)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:708)\n\tat org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:551)\n\tat org.springframework.cloud.stream.function.PartitionAwareFunctionWrapper.apply(PartitionAwareFunctionWrapper.java:84)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionWrapper.apply(FunctionConfiguration.java:754)\n\tat org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1.handleMessageInternal(FunctionConfiguration.java:586)\n\tat org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)\n\tat org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)\n\tat org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)\n\tat org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)\n\tat org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)\n\tat org.springframework.integration.endpoint.MessageProducerSupport.sendMessage(MessageProducerSupport.java:216)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.sendMessageIfAny(KafkaMessageDrivenChannelAdapter.java:397)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter.access$300(KafkaMessageDrivenChannelAdapter.java:83)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:454)\n\tat org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter$IntegrationRecordMessageListener.onMessage(KafkaMessageDrivenChannelAdapter.java:428)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.lambda$onMessage$0(RetryingMessageListenerAdapter.java:125)\n\tat org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)\n\tat org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:255)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:119)\n\tat org.springframework.kafka.listener.adapter.RetryingMessageListenerAdapter.onMessage(RetryingMessageListenerAdapter.java:42)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeOnMessage(KafkaMessageListenerContainer.java:2629)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeOnMessage(KafkaMessageListenerContainer.java:2609)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:2536)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:2427)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:2305)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1979)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeIfHaveRecords(KafkaMessageListenerContainer.java:1364)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1355)\n\tat org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1247)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.nio.file.FileAlreadyExistsException: /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)\n\tat java.base/java.nio.file.Files.newByteChannel(Files.java:371)\n\tat java.base/java.nio.file.Files.createFile(Files.java:648)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:243)\n\t... 52 more\n\n\tat esa.s1pdgs.cpoc.common.utils.Retries.throwRuntimeException(Retries.java:53)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:28)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.downloadObjectsWithPrefix(S3ObsServices.java:212)\n\t... 50 more\nCaused by: esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServiceException: {'bucket': \"ops-rs-aux\", 'key': \"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\", 'msg': \"Directory creation fails for /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\" }\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:245)\n\tat esa.s1pdgs.cpoc.common.utils.Retries.performWithRetries(Retries.java:23)\n\t... 51 more\nCaused by: java.nio.file.FileAlreadyExistsException: /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)\n\tat java.base/java.nio.file.Files.newByteChannel(Files.java:371)\n\tat java.base/java.nio.file.Files.createFile(Files.java:648)\n\tat esa.s1pdgs.cpoc.obs_sdk.s3.S3ObsServices.lambda$downloadObjectsWithPrefix$3(S3ObsServices.java:243)\n\t... 52 more\n"},"task":{"uid":"469a5925-25c7-419e-84f5-e924b31cce6d","name":"ObsRead","event":"END","status":"NOK","output":{},"input":{"bucket_string":"ops-rs-aux","filename_string":"S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941"},"quality":{},"error_code":1,"duration_in_seconds":5.174999,"missing_output":[]}}
{"header":{"type":"LOG","timestamp":"2022-10-07T07:54:48.702264Z","level":"WARN","line":36,"file":"Retries.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Error on performing Download of metadata file S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941 to /data/local-catalog/s3_aux/ (2/11), retrying in 3000ms"},"custom":{"logger_string":"esa.s1pdgs.cpoc.common.utils.Retries"}}
{"header":{"type":"REPORT","timestamp":"2022-10-07T07:54:51.702000Z","level":"INFO"`

In the error msg, we observed:

"Directory creation fails for /data/local-catalog/s3_aux/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL\

Inside the pod, we observed that the products are well-downloaded (md5 check is also OK):

rsuser@metadata-catalog-part1-metadata-extraction-v1-5784c88984-sb4zn:/data/local-catalog/s3_aux$ ls -l S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/*                       
-rw-r--r-- 1 rsuser rsuser 80551410 Oct  5 11:35 S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL
-rw-r--r-- 1 rsuser rsuser     1284 Oct  5 11:35 S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.HDR
rsuser@metadata-catalog-part1-metadata-extraction-v1-5784c88984-sb4zn:/data/local-catalog/s3_aux$ md5sum -b S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL
551dd17ee4e344b66a11f2aff91383fb *S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.DBL
rsuser@metadata-catalog-part1-metadata-extraction-v1-5784c88984-sb4zn:/data/local-catalog/s3_aux$ md5sum -b S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.HDR
88188e8a4bea3313ce077681734fc04c *S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941/S3A_OPER_AUX_GNSSRD_POD__20220920T033347_V20220912T235942_20220913T235941.HDR

To conclude, we observed multiple download retries and error even if the product has been successfully retrieved to the pod.

Expected Behavior: Metadata shall be extracted from AUX_GNSSRD_POD products. A clear error should be raised in this error case.

Steps To Reproduce: Ingest a S3 AUX_GNSSRD_POD aux files using the RS Core Ingestion (AGS-AUXIP conf) on S3 workflow.

Whenever possible, first analysis of the root cause It seems that DBL files are considered as folder.

Note: It is not a no left space issue on pod.

Bug Generic Definition of Ready (DoR)

[X] The affect version in which the bug has been found is mentioned
[X] The context and environment of the bug is detailed
[X] The description of the bug is clear and unambiguous
[X] The procedure (steps) to reproduce the bug is clearly detailed
[ ] The tested User Story / features is linked to the bug if available
[X] Logs are attached if available
[X] A data set attached if available

Bug Generic Definition of Done (DoD)

[ ] the modification implemented (the solution to fix the bug) is described in the bug.
[ ] Unit tests & Continuous integration performed - Test results available - Structural Test coverage reported by SONAR
[ ] Code committed in GIT with right tag or Analysis/Trade Off documentation up-to-date in reference-system-documentation repository
[ ] Code is compliant with coding rules (SONAR Report as evidence)
[ ] Acceptance criteria of the related User story are checked and Passed

Woljtek commented 1 year ago

Hi @w-fsi , Could you have a look to this issue?

w-fsi commented 1 year ago

Can you please provide us the full log of the metadata extract as from the snippets previous decisions made by the extraction are not visible. We normally need the full log to have a chance to see how the incoming event is looking like.

The product type there is currently not in use and not know to us. Workaround shall be to modify the regexp to exclude it.

The trace message is referencing the bucket "rs-aux" that does not exist in the configuration. This is strange and looks like config and logs are not matching.

w-jka commented 1 year ago

The provided configuration for the AUXIP contains an inbox (inbox4) that uses a regexp not matching for Sentinel-3 auxiliary products. The configured inbox5 is enough to catch all Sentinel-3 auxiliary products, therefore we propose to delete inbox4 from the configuration (line 45 -51 in the provided link on the issue)

wruffine-csgroup commented 1 year ago

@w-jka Thank you for this information. The configuration shall be modified

@w-fsi To my understanding, the current exclude regex for the auxip must be set to $a as a workaround for another issue. Can it be changed to exclude these files and still work as a workaround for issue #549 ?

I may be extremely wrong, but would it be possible for the error Directory creation fails [...] to be due to the fact that the files are already present?

w-fsi commented 1 year ago

Yes "$a" is a workaround as hopefully there is no string that contains after its end an "a". Basically this could be used for it, but as long as its not understood why this is not working correctly, I would recommend to edit the match pattern instead.

The Directory creation fails is likely caused by something else. We didn't perform an analyse on it yet, but I think it is not detected as a directory product (containing two files), but as two products and thus running into an issue. The recommended workaround there is to exclude the products as they are not used (and maybe not even valid) S3 products. So the recommendation of @w-jka shall remove the blocking from the issue.

pcuq-ads commented 1 year ago

Hello, This morning (10/10 @10:14), the problem remains. Here is one of the 2897 errors for the last 2 days :

Metadata extraction failed: java.lang.RuntimeException: java.lang.RuntimeException: Error: Number of retries has exceeded while performing Download of metadata file S3A_OPER_AUX_GNSSRD_POD__20220912T033337_V20220904T235942_202209

The lag continue to increase.

Woljtek commented 1 year ago

Dears @wruffine-csgroup & @vvernet-csgroup ,

In the configuration of SCDF, it seems the inbox4 is still enabled: source: https://processing.platform.ops-csc.com/dashboard/#/streams/list/ingestion-aux-part1

As proposed by @w-jka , could you redeploy the stream without inbox4 in order to see if the root cause of the bug is the ingestion of OPER_AUX_GNSSRD_PODaux?

w-fsi commented 1 year ago

Hi @pcuq-ads , if you add the workaround that was proposed it shall not happen anymore that the invalid product is ingested into the system. Did you clean however on Friday the topics as well? If they request is in there, the MDC will consume it as they are not processed successful and thus might explain your observation.

Woljtek commented 1 year ago

The workaround proposed by @w-jka is not sufficient. Indeed, there are still blocking messages in the lag.

Below, we can see that the extraction is blocking at a current-offset of 117:

I rerstarted Yesterday (undeployed/deployed) the stream.

I observe dthat metadata-catalog-part1.metadata-filter topic is not connected to its expected consumer: extraction-worker. During the restart, this error happens:

{"header":{"type":"LOG","timestamp":"2022-10-10T16:24:30.279075Z","level":"ERROR","line":250,"file":"LogAccessor.java","thread":"KafkaConsumerDestination{consumerDestinationName='metadata-catalog-part1.metadata-filter', partitions=1, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1@1cdcda59]; nested exception is java.lang.NullPointerException: Name is null, failedMessage=GenericMessage [payload=byte[951], headers={deliveryAttempt=3, kafka_timestampType=CREATE_TIME, kafka_receivedTopic=metadata-catalog-part1.metadata-filter, target-protocol=kafka, b3=616d82dc121dabf3-3b38edc97a2ad8a4-0, nativeHeaders={b3=[616d82dc121dabf3-3b38edc97a2ad8a4-0]}, kafka_offset=117, scst_nativeHeadersPresent=true, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer@398080d8, kafka_receivedPartitionId=0, contentType=application/json, kafka_receivedTimestamp=1664961748834, kafka_groupId=metadata-catalog-part1}]\n\tat org.springframework.integration.support.utils.IntegrationUtils.wrapInHandlingExceptionIfNecessary(IntegrationUtils.java:191)\n\tat org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:65)\n\tat org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)\n\tat org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)\n\tat org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)\n\tat org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)
[...]

@praynaud-ads :

Could you provide the content the message 117.
Coudld you add all logs of the extraction-worker since it restarted

w-fsi commented 1 year ago

Can you please provide us the name of these products that are causing the ongoing issues? The original GPOD shall not be ingested anymore.

praynaud-ads commented 1 year ago

Logs of metadata extraction : logs_metadata_extraction.txt

w-fsi commented 1 year ago

@praynaud-ads : please verify, if on the error-warnings and DLQ log there are any findings of affected products like "S3A_OPER_AUX_GNSSRD_POD__20220914T033340_V20220906T235942_20220907T235941". We assume that the DLQ is restarting them as it is configured to restart these kind of issues:

app.dlq-manager.dlq-manager.routing.obs4.errorTitle=Generic OBS issue
app.dlq-manager.dlq-manager.routing.obs4.errorID=.*ObsServiceException.*
app.dlq-manager.dlq-manager.routing.obs4.actionType=Restart
#app.dlq-manager.dlq-manager.routing.obs4.targetTopic=
app.dlq-manager.dlq-manager.routing.obs4.maxRetry=2
app.dlq-manager.dlq-manager.routing.obs4.priority=100

So you'll likely find this product also multiple times on the kafka topic?

praynaud-ads commented 1 year ago

Indeed, DLQ Manager logs contains some products like "S3A_OPER_AUX_GNSSRD_POD..." .

You can see the logs below : logs_dlq_manager.txt

pcuq-ads commented 1 year ago

Following discussion with CS & WERUML, we need to update the configuration with following changes :

Set timeout to 5 minutes
Scale up the POD number from 1 to 3 (then 5)
Configure DLQ to avoid resending some errors

w-jka commented 1 year ago

Please note, that the timeout for kafka is set to 5 minutes by default. The more important configuration parameter is the max.poll.records. The batch size of 500 is too big and results in the consumer getting kicked out of the consumer group, especially when an error happens during download of the manifest file (which happens in the erroneous messages for the S3AOPER... product).

As it was detected, that a failing download results in a duration of roughly 1 minute per message, the maximum batch size shall be at maximum 4. This should enforce, that the new kafka offset will be communicated to the kafka broker, and the consumer will not be kicked out of the consumer group.

pcuq-ads commented 1 year ago

OK so first configuration to be applied is

Set timeout to 5 minutes
Set buffer size to 4
Scale up the POD number from 1 to 3 (then 5)

We will handle DLQ on a second time.

LAQU156 commented 1 year ago

IVV_CCB_2022_w41 : Closed, remaining to do is covered by #557

COPRS / rs-issues

[BUG] [OPS] Metadata Catalog Extraction 'Error on downloading from OBS' for S3 AUX #610