getsentry / snuba

Search the seas for your lost treasure.
Other
331 stars 54 forks source link

Consumer crashes #5509

Open CMDMichalKoval opened 5 months ago

CMDMichalKoval commented 5 months ago

Environment

What version are you running? 24.1.1

Steps to Reproduce

Expected Result

Not crash.

Actual Result

Consumer snuba-consumer crashes with error

2024-02-07 23:48:37,206 librdkafka log level: 6                                                                                                                                                                  
2024-02-07 23:48:38,289 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 10090059}                                                                                                      
2024-02-07 23:50:04,176 Caught exception, shutting down...                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run                                                                                                                
    self._run_once()                                                                                                                                                                                             
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 393, in _run_once                                                                                                          
    self.__processing_strategy.poll()                                                                                                                                                                            
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll                                                                                                        
    self.__inner_strategy.poll()                                                                                                                                                                                 
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll                                                                                                      
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll                                                                                                         
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/reduce.py", line 168, in poll                                                                                                       
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll                                                                                          
    result = future.result()                                                                                                                                                                                     
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result                                                                                                                              
    return self.__get_result()                                                                                                                                                                                   
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result                                                                                                                        
    raise self._exception                                                                                                                                                                                        
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                                                                                 
    result = self.fn(*self.args, **self.kwargs)                                                                                                                                                                  
  File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 137, in flush_batch                                                                                                                            
    message.payload.close()                                                                                                                                                                                      
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close                                                                                                                                          
    self.__insert_batch_writer.close()                                                                                                                                                                           
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 166, in close                                                                                                                                          
    self.__writer.write(                                                                                                                                                                                         
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 359, in write                                                                                                                                             
    batch.join(timeout=batch_join_timeout)                                                                                                                                                                       
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 282, in join                                                                                                                                              
    raise ClickhouseWriterError(message, code=code, row=row)                                                                                                                                                     
snuba.clickhouse.errors.ClickhouseWriterError: Too large value for FixedString(32): (while reading the value of key primary_hash): (at row 1)                                                                    
2024-02-07 23:50:04,186 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f622bfdeb30>...                                                                                                       
2024-02-07 23:50:04,186 Partitions to revoke: [Partition(topic=Topic(name='events'), index=0)]                                    

snuba-metrics-consumer:

2024-02-07 23:59:14,276 librdkafka log level: 6                                                                                                                                                                  
2024-02-07 23:59:14,304 New partitions assigned: {Partition(topic=Topic(name='snuba-metrics'), index=0): 0}                                                                                                      
2024-02-07 23:59:17,326 Caught exception, shutting down...                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run                                                                                                                
    self._run_once()                                                                                                                                                                                             
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 393, in _run_once                                                                                                          
    self.__processing_strategy.poll()                                                                                                                                                                            
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll                                                                                                        
    self.__inner_strategy.poll()                                                                                                                                                                                 
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll                                                                                                      
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll                                                                                                         
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/reduce.py", line 168, in poll                                                                                                       
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll                                                                                          
    result = future.result()                                                                                                                                                                                     
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result                                                                                                                              
    return self.__get_result()                                                                                                                                                                                   
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result                                                                                                                        
    raise self._exception                                                                                                                                                                                        
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                                                                                 
    result = self.fn(*self.args, **self.kwargs)                                                                                                                                                                  
  File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 137, in flush_batch                                                                                                                            
    message.payload.close()                                                                                                                                                                                      
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close                                                                                                                                          
    self.__insert_batch_writer.close()                                                                                                                                                                           
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 166, in close                                                                                                                                          
    self.__writer.write(                                                                                                                                                                                         
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 359, in write                                                                                                                                             
    batch.join(timeout=batch_join_timeout)                                                                                                                                                                       
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 282, in join                                                                                                                                              
    raise ClickhouseWriterError(message, code=code, row=row)                                                                                                                                                     
snuba.clickhouse.errors.ClickhouseWriterError: Method write is not supported by storage Distributed with more than one shard and no sharding key provided (version 21.8.13.6 (official build))                   
2024-02-07 23:59:17,335 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7fd177741d50>...  
untitaker commented 5 months ago

2024-02-07 23:48:38,289 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 10090059}

this tells you which message is bad. can you dump the message on that offset? it should be possible to do so with kafkactl/kcat

untitaker commented 5 months ago

as a hotfix you can also delete the consumer group using kafkactl and run with --auto-offset-reset latest, this will basically "flush the queue"