fedora-infra / fedora-messaging

A library for sending AMQP messages with JSON schema in Fedora infrastructure
GNU General Public License v2.0
53 stars 52 forks source link

ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488) #175

Open bowlofeggs opened 5 years ago

bowlofeggs commented 5 years ago

In Fedora's staging environment, I have observed this problem in both OpenShift and bodhi-backend01.stg today:

May 16 19:27:54 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO fedora_messaging.twisted.protocol] Successfully consumed message from topic org.fedoraproject.stg.bodhi.composer.start (me
ssage id 0c33f076-5f4c-45a6-a157-07960474fce2)                                                                                                                                                                    
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO fedora_messaging.twisted.protocol] Consuming message from topic org.fedoraproject.stg.bodhi.composer.start (message id 653
63292-a613-470a-89c5-b165904e6247)                                                                                                                                                                                
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO bodhi] Received message from fedora-messaging with topic: org.fedoraproject.stg.bodhi.composer.start                     
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR pika.adapters.utils.io_services_utils] _AsyncBaseTransport._consume() failed, aborting connection: error=SSLEOFError(8, '
EOF occurred in violation of protocol (_ssl.c:2488)'); sock=<ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.5.128.79', 44822)>; Caller's stack:              
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):                                                                                             
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume                     
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     super(_AsyncSSLTransport, self)._consume()                                                                                 
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume                      
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)                                                            
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap              
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)                                                                                               
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv             
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return sock.recv(max_bytes)                                                                                                
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv                                                                       
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self.read(buflen)                                                                                                    May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 913, in read
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self._sslobj.read(len)                                                                                               May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):                                                                                              May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     super(_AsyncSSLTransport, self)._consume()                                                                                 
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume                      
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)                                                            
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap              
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)                                                                                               
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv             
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return sock.recv(max_bytes)                                                                                                
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv                                                                       
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self.read(buflen)                                                                                                   
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 913, in read                                                                        
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self._sslobj.read(len)                                                                                              
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR pika.adapters.utils.io_services_utils] Wrapped func exited with exception. Caller's stack:                      [13/1908]
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):                                                                                             
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap                        
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)                                                                                               
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume                     
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     super(_AsyncSSLTransport, self)._consume()                                                                                  May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume                      
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)                                                            
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap              
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)                                                                                                May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return sock.recv(max_bytes)                                                                                                
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv                                                                       
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self.read(buflen)                                                                                                   
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 913, in read           
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self._sslobj.read(len)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     super(_AsyncSSLTransport, self)._consume()
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return func(*args, **kwargs)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return sock.recv(max_bytes)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self.read(buflen)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib64/python3.7/ssl.py", line 913, in read
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     return self._sslobj.read(len)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR pika.adapters.base_connection] connection_lost: StreamLostError: ("Stream connection lost: SSLEOFError(8, 'EOF occurred i
n violation of protocol (_ssl.c:2488)')",)
May 16 19:39:39 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR pika.adapters.blocking_connection] Unexpected connection close detected: StreamLostError: ("Stream connection lost: SSLEO
FError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)')",)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR fedora_messaging.twisted.protocol] Received unexpected exception from consumer Consumer(queue=bodhi.stg_composer, callbac
k=<bodhi.server.consumers.Consumer object at 0x7f287e553c18>)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/fedora_messaging/_session.py", line 152, in publish
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     self._connect_and_publish(exchange, message)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/fedora_messaging/_session.py", line 191, in _connect_and_publish
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     properties=message._properties,
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2203, in basic_publish
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     self._flush_output(self._message_confirmation_result.is_ready)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 1327, in _flush_output
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     self._connection._flush_output(lambda: self.is_closed, *waiters)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 523, in _flush_output
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     raise self._closed_result.value.error
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: pika.exceptions.StreamLostError: Stream connection lost: SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)')
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: During handling of the above exception, another exception occurred:
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: Traceback (most recent call last):
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:   File "/usr/lib/python3.7/site-packages/fedora_messaging/twisted/protocol.py", line 239, in _read
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]:     yield threads.deferToThread(consumer.callback, message)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: fedora_messaging.exceptions.ConnectionException
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [ERROR fedora_messaging.cli] Unexpected error occurred in consumer Consumer(queue=bodhi.stg_composer, callback=<bodhi.server.con
sumers.Consumer object at 0x7f287e553c18>): <twisted.python.failure.Failure fedora_messaging.exceptions.ConnectionException: >
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO fedora_messaging.twisted.protocol] Waiting for 0 consumer(s) to finish processing before halting
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO fedora_messaging.twisted.protocol] Finished canceling 0 consumers
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO twisted] Stopping factory FedoraMessagingFactoryV2(parameters=<URLParameters host=rabbitmq.stg.fedoraproject.org port=5671
 virtual_host=/pubsub ssl=True>, confirms=True)
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO fedora_messaging.twisted.protocol] Disconnect requested, but AMQP connection already gone
May 16 19:39:40 bodhi-backend01.stg.phx2.fedoraproject.org fedora-messaging[5584]: [INFO twisted] Main loop terminated.
May 16 19:39:41 bodhi-backend01.stg.phx2.fedoraproject.org systemd[1]: fm-consumer@config.service: Main process exited, code=exited, status=13/n/a

In the case of Bodhi's OpenShift consumer, it ends up crashing the process, systemd restarts it, the message is tried again, and it was successful.

However, in Bodhi's composer, the composer itself caught the exception (it has a catch all so we can log and mark the compose as failed), which means that the message was ultimately ACK'd and the process did not halt and the compose was left in the failed state.

Should it be the responsibility of the application (i.e., Bodhi) to catch this kind of error, or should fedora-messaging catch and handle it for me? I'm not sure fedora-messaging could catch it in the case of Bodhi's composer…

jeremycline commented 5 years ago

So after a lot of staring at the backtrace and the code, I believe what's happening is publish is being called in the composer consumer, but no exceptions are being handled so when the connection error occurs this is bubbled up into the consumer loop and crashes the consumer.

The publish API is raising the proper "ConnectionException" in response to "pika.exceptions.StreamLostError" so this isn't exactly a bug in fedora-messaing, but what it's not doing is automatically retrying before giving up because it's entering the generic AMQPError handler. We should fix this, especially because the blocking connection times out regularly as it cannot heartbeat. This will cause users to hit it less frequently, but they still need to handle exceptions where the automatic retry doesn't work.

bowlofeggs commented 5 years ago

Hmm, I was able to reproduce this again with fedora-messaging-1.7.0:

-- Logs begin at Wed 2018-10-31 23:49:56 UTC, end at Wed 2019-05-22 21:23:39 UTC. --
2019-05-22 21:17:49,912 ERROR [bodhi][f29-updates] Pungi exited with exit code 1
2019-05-22 21:17:50,020 ERROR [bodhi][f29-updates] Stderr: Traceback (most recent call last):
  File "/usr/bin/pungi-koji", line 482, in <module>
    main()
  File "/usr/bin/pungi-koji", line 261, in main
    run_compose(compose, create_latest_link=create_latest_link, latest_link_status=latest_link_status)
  File "/usr/bin/pungi-koji", line 381, in run_compose
    essentials_phase.stop()
  File "/usr/lib/python3.7/site-packages/pungi/phases/weaver.py", line 53, in stop
    self.pool.stop()
  File "/usr/lib/python3.7/site-packages/kobo/threads.py", line 120, in stop
    six.reraise(exc_info[0], exc_info[1], exc_info[2])
  File "/usr/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/kobo/threads.py", line 67, in run
    self.process(item, num)
  File "/usr/lib/python3.7/site-packages/pungi/phases/weaver.py", line 69, in process
    phase.start()
  File "/usr/lib/python3.7/site-packages/pungi/phases/base.py", line 64, in start
    self.run()
  File "/usr/lib/python3.7/site-packages/pungi/phases/gather/__init__.py", line 111, in run
    manifest=self.manifest)
  File "/usr/lib/python3.7/site-packages/pungi/phases/gather/link.py", line 122, in link_files
    pool.stop()
  File "/usr/lib/python3.7/site-packages/kobo/threads.py", line 120, in stop
    six.reraise(exc_info[0], exc_info[1], exc_info[2])
  File "/usr/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/kobo/threads.py", line 67, in run
    self.process(item, num)
  File "/usr/lib/python3.7/site-packages/pungi/linker.py", line 63, in process
    self.pool.linker.link(src, dst, link_type=self.pool.link_type)
  File "/usr/lib/python3.7/site-packages/pungi/linker.py", line 214, in link
    self._link_file(src, dst, link_type)
  File "/usr/lib/python3.7/site-packages/pungi/linker.py", line 207, in _link_file
    self.copy(src, dst)
  File "/usr/lib/python3.7/site-packages/pungi/linker.py", line 182, in copy
    shutil.copy2(src, dst)
  File "/usr/lib64/python3.7/shutil.py", line 263, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib64/python3.7/shutil.py", line 121, in copyfile
    with open(dst, 'wb') as fdst:
OSError: [Errno 28] No space left on device: '/mnt/koji/compose/updates/Fedora-29-updates-20190522.1/compose/Everything/armhfp/debug/tree/Packages/n/NetworkManager-team-debuginfo-1.12.6-5.fc29.armv7hl.rpm'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/bin/pungi-koji", line 489, in <module>
    COMPOSE.write_status("DOOMED")
  File "/usr/lib/python3.7/site-packages/pungi/compose.py", line 326, in write_status
    self.notifier.send('status-change', status=stat_msg)
  File "/usr/lib/python3.7/site-packages/pungi/notifier.py", line 85, in send
    self._run_script(cmd, msg, workdir, kwargs)
  File "/usr/lib/python3.7/site-packages/pungi/notifier.py", line 107, in _run_script
    logfile=logfile)
  File "/usr/lib/python3.7/site-packages/kobo/shortcuts.py", line 288, in run
    log = open(logfile, mode)
OSError: [Errno 28] No space left on device: '/mnt/koji/compose/updates/Fedora-29-updates-20190522.1/logs/global/notifications/notification-2019-05-22_21-17-48.log'
2019-05-22 21:17:50,272 ERROR [bodhi][f29-updates] Exception in ComposerThread(f29-updates)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
2019-05-22 21:17:50,495 INFO  [bodhi][f29-updates] Compose object updated.
2019-05-22 21:17:50,495 INFO  [bodhi][f29-updates] Thread(f29-updates) finished.  Success: False
2019-05-22 21:17:50,498 ERROR [pika.adapters.utils.io_services_utils][f29-updates] _AsyncBaseTransport._consume() failed, aborting connection: error=SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)'); sock=<ssl.SSLSocket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.5.128.79', 33970)>; Caller's stack:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
    super(_AsyncSSLTransport, self)._consume()
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
    data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
    return sock.recv(max_bytes)
  File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib64/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
    super(_AsyncSSLTransport, self)._consume()
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
    data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
    return sock.recv(max_bytes)
  File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib64/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
2019-05-22 21:17:50,499 ERROR [pika.adapters.utils.io_services_utils][f29-updates] Wrapped func exited with exception. Caller's stack:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
    super(_AsyncSSLTransport, self)._consume()
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
    data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
    return sock.recv(max_bytes)
  File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib64/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
    super(_AsyncSSLTransport, self)._consume()
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
    data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
    return sock.recv(max_bytes)
  File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib64/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
2019-05-22 21:17:50,499 ERROR [pika.adapters.base_connection][f29-updates] connection_lost: StreamLostError: ("Stream connection lost: SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)')",)
2019-05-22 21:17:50,500 ERROR [pika.adapters.blocking_connection][f29-updates] Unexpected connection close detected: StreamLostError: ("Stream connection lost: SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)')",)
2019-05-22 21:17:50,500 INFO  [fedora_messaging._session][f29-updates] Resetting connection to rabbitmq.stg.fedoraproject.org
2019-05-22 21:17:50,610 INFO  [bodhi][f29-updates] Compose object updated.
2019-05-22 21:17:50,610 ERROR [bodhi][f29-updates] ComposerThread failed. Transaction rolled back.
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 316, in run
    self.work()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 386, in work
    self._compose_updates()
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 925, in _compose_updates
    self._wait_for_pungi(pungi_process)
  File "/usr/lib/python3.7/site-packages/bodhi/server/consumers/composer.py", line 1211, in _wait_for_pungi
    raise Exception('Pungi exited with status %d' % pungi_process.returncode)
Exception: Pungi exited with status 1
2019-05-22 21:17:50,710 INFO  [bodhi][f29-updates] Released semaphore
2019-05-22 21:17:50,710 INFO  [bodhi][PoolThread-twisted.internet.reactor-0] Push complete!  Summary follows:
2019-05-22 21:17:50,710 INFO  [bodhi][PoolThread-twisted.internet.reactor-0]   name:  f29-updates           success:  False
2019-05-22 21:17:50,710 INFO  [fedora_messaging.twisted.protocol][MainThread] Successfully consumed message from topic org.fedoraproject.stg.bodhi.composer.start (message id c1f3c409-2050-4524-aa95-7e0f4247bf6a)
bowlofeggs commented 5 years ago

It's not immediately obvious to me by looking at the tracebacks whether this is due to my publishing code, or whether this is due to my consuming code. Since this is the composer, we are both publishing and consuming in the same process. And it is likely that I attempted to publish a message about the failure due to being out of disk space, so perhaps that is what triggered it. However, I do see calls to a _consume() function in the traceback too, so…? Might need to get in there with a debugger to figure out more, but that's also tricky since I don't yet know a reliable reproducer.

bowlofeggs commented 5 years ago

I just did a test with Bodhi:

0) I added a log statement in Bodhi's publish function to log every message before it tries to send. 1) I ran a compose. 2) The compose did generate the SSLEOFError. 3) I verified that every message logged made it to datagrepper.

Thus, I am satisfied that this error is actually solved. I think it's just getting logged by pika which is why it shows up in my logs, but fedora-messaging does seem to respond to it favorably and makes sure the message is still sent. On my end, I will probably just stop logging pika so I don't see the tracebacks and call it a day.

We can close this if you want, unless there's something you'd like to use it to track.

jeremycline commented 5 years ago

I'd like to keep this open to track not leaving stale connections open in the publish code.

The reason I originally kept the connection around is making a new TLS connection is expensive and if you publish rapidly you can re-use the connection. The problem is most things don't publish rapidly and this leaves open connections that the broker eventually has to clean up along with nasty tracebacks.

We either need to find a way to heartbeat with the synchronous API (e.g. run a thread somewhere) or open and close a connection per publish.