dashbitco / broadway_kafka

A Broadway connector for Kafka
233 stars 53 forks source link

fetch_messages_from_kafka/3 Error: :unknown_server_error #83

Closed amacciola closed 2 years ago

amacciola commented 2 years ago

In one of my production clusters i just switched over our Auditing application to also start using BroadwayKafka for its ingest pipeline

However i am seeing errors from this method: https://github.com/dashbitco/broadway_kafka/blob/34d34b80ed17a23f7d1364d6bcf6cb4dfe30d032/lib/broadway_kafka/producer.ex#L530

03:25:25.668 [error] GenServer :"AuditLogPipeline.Broadway.Producer_2" terminating
** (RuntimeError) cannot fetch records from Kafka (topic=_cogynt_audit_log partition=6 offset=0). Reason: :unknown_server_error
    (broadway_kafka 0.3.4) lib/broadway_kafka/producer.ex:537: BroadwayKafka.Producer.fetch_messages_from_kafka/3
    (broadway_kafka 0.3.4) lib/broadway_kafka/producer.ex:303: BroadwayKafka.Producer.handle_info/2
    (broadway 1.0.3) lib/broadway/topology/producer_stage.ex:229: Broadway.Topology.ProducerStage.handle_info/2
    (gen_stage 1.1.2) lib/gen_stage.ex:2117: GenStage.noreply_callback/3
    (stdlib 3.17) gen_server.erl:695: :gen_server.try_dispatch/4
    (stdlib 3.17) gen_server.erl:771: :gen_server.handle_msg/6
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:poll, {17108, "_cogynt_audit_log", 6}}
State: %{consumers: [{#PID<0.2859.0>, #Reference<0.1885855812.438304784.200151>}, {#PID<0.2858.0>, #Reference<0.1885855812.438304784.200140>}, {#PID<0.2857.0>, #Reference<0.1885855812.438304784.200126>}, {#PID<0.2856.0>, #Reference<0.1885855812.438304784.200107>}, {#PID<0.2855.0>, #Reference<0.1885855812.438304784.200096>}, {#PID<0.2854.0>, #Reference<0.1885855812.438304784.200081>}, {#PID<0.2853.0>, #Reference<0.1885855812.438304784.200066>}, {#PID<0.2852.0>, #Reference<0.1885855812.438304776.200786>}, {#PID<0.2851.0>, #Reference<0.1885855812.438304784.200049>}, {#PID<0.2850.0>, #Reference<0.1885855812.438304769.200498>}], module: BroadwayKafka.Producer, module_state: %{acks: %{{17108, "_cogynt_audit_log", 6} => {[], 0, []}}, allocator_names: {2, [AuditLogPipeline.Allocator_processor_default], [AuditLogPipeline.Allocator_batcher_consumer_default]}, buffer: {[], []}, client: BroadwayKafka.BrodClient, client_id: AuditLogPipeline.Broadway.Producer_2.Client, config: %{client_config: [connect_timeout: 30000], fetch_config: %{}, group_config: [offset_commit_policy: :commit_to_kafka_v2, session_timeout_seconds: 30], group_id: "AuditLog-df7078f6-7693-4725-8b88-2a138fc847ce", hosts: [{"kafka", 9071}], offset_commit_on_ack: true, offset_reset_policy: :earliest, receive_interval: 2000, reconnect_timeout: 1000, topics: ["_cogynt_audit_log"]}, demand: 100, group_coordinator: #PID<0.2811.0>, receive_interval: 2000, receive_timer: #Reference<0.1885855812.438304783.200132>, reconnect_timeout: 1000, revoke_caller: nil, shutting_down?: false}, rate_limiting: nil, transformer: {CogyntAudit.Broadway.AuditLogPipeline, :transform, []}}
03:25:25.668 [error] GenServer :"AuditLogPipeline.Broadway.Producer_4" terminating
** (RuntimeError) cannot fetch records from Kafka (topic=_cogynt_audit_log partition=5 offset=0). Reason: :unknown_server_error
    (broadway_kafka 0.3.4) lib/broadway_kafka/producer.ex:537: BroadwayKafka.Producer.fetch_messages_from_kafka/3
    (broadway_kafka 0.3.4) lib/broadway_kafka/producer.ex:303: BroadwayKafka.Producer.handle_info/2
    (broadway 1.0.3) lib/broadway/topology/producer_stage.ex:229: Broadway.Topology.ProducerStage.handle_info/2
    (gen_stage 1.1.2) lib/gen_stage.ex:2117: GenStage.noreply_callback/3
    (stdlib 3.17) gen_server.erl:695: :gen_server.try_dispatch/4
    (stdlib 3.17) gen_server.erl:771: :gen_server.handle_msg/6
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:poll, {17108, "_cogynt_audit_log", 5}}

This is causing my Producers to restart over and over again. They are at a very high restart count now

Group member (AuditLog-df7078f6-7693-4725-8b88-2a138fc847ce,coor=#PID<0.2843.0>,cb=#PID<0.2839.0>,generation=17115):

Any help with regards to this error (which looks like an error returned from the brod client) would be helpful. Thanks !

amacciola commented 2 years ago

Ignore this our Kafka tiered storage got disabled 👎