JetBrains / teamcity-messages

Python Unit Test Reporting to TeamCity
https://pypi.python.org/pypi/teamcity-messages
Apache License 2.0
137 stars 81 forks source link

Intermittent EAGAIN IOError when teamcity-messages writes to its output #212

Closed morganwahl closed 5 years ago

morganwahl commented 5 years ago

Our tests occasionally fail when run on TeamCity with a traceback like this:

[Step 1/2] Traceback (most recent call last):
[11:58:30]
[Step 1/2]   File "bin/manage_addgene.py", line 14, in <module>
[11:58:30]
[Step 1/2]     manage.main()
[11:58:30]
[Step 1/2]   File "/opt/jetbrains/buildAgent/work/5d9b0320cc9573e4/src/django/manage.py", line 23, in main
[11:58:30]
[Step 1/2]     execute_from_command_line(sys.argv)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
[11:58:30]
[Step 1/2]     utility.execute()
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 356, in execute
[11:58:30]
[Step 1/2]     self.fetch_command(subcommand).run_from_argv(self.argv)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/commands/test.py", line 29, in run_from_argv
[11:58:30]
[Step 1/2]     super(Command, self).run_from_argv(argv)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/base.py", line 283, in run_from_argv
[11:58:30]
[Step 1/2]     self.execute(*args, **cmd_options)
[11:58:30]
[Step 1/2]   File "/opt/jetbrains/buildAgent/work/5d9b0320cc9573e4/src/django/addgene/common/management/commands/atest.py", line 16, in execute
[11:58:30]
[Step 1/2]     super(TestCommand, self).execute(*args, **options)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/base.py", line 330, in execute
[11:58:30]
[Step 1/2]     output = self.handle(*args, **options)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/core/management/commands/test.py", line 62, in handle
[11:58:30]
[Step 1/2]     failures = test_runner.run_tests(test_labels)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/test/runner.py", line 603, in run_tests
[11:58:30]
[Step 1/2]     result = self.run_suite(suite)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/test/runner.py", line 567, in run_suite
[11:58:30]
[Step 1/2]     return runner.run(suite)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/teamcity/unittestpy.py", line 308, in run
[11:58:30]
[Step 1/2]     return super(TeamcityTestRunner, self).run(test)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/unittest2/runner.py", line 172, in run
[11:58:30]
[Step 1/2]     test(result)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/unittest2/suite.py", line 87, in __call__
[11:58:30]
[Step 1/2]     return self.run(*args, **kwds)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/django/test/runner.py", line 385, in run
[11:58:30]
[Step 1/2]     handler(test, *args)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/teamcity/unittestpy.py", line 112, in addSkip
[11:58:30]
[Step 1/2]     self.messages.testIgnored(test_id, message="Skipped" + reason_str, flowId=test_id)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/teamcity/messages.py", line 150, in testIgnored
[11:58:30]
[Step 1/2]     self.message('testIgnored', name=testName, message=message, flowId=flowId)
[11:58:30]
[Step 1/2]   File "/srv/addgene/local/lib/python2.7/site-packages/teamcity/messages.py", line 82, in message
[11:58:30]
[Step 1/2]     self.output.flush()
[11:58:30]
[Step 1/2] IOError: [Errno 11] Resource temporarily unavailable

As you can see, we're running regular python unittests, using django's test-running system.

From what I've read, I wouldn't expect an IO operation like flush() to result in this error unless the file was in non-blocking mode. Is it possible stdout is non-blocking when being run by teamcity?

morganwahl commented 5 years ago

I've confirmed the stream being written to is open in non-blocking mode.

I defined this function:

def print_flags(filehandle):
    print("file:", filehandle, filehandle.fileno())
    flags = fcntl.fcntl(filehandle, fcntl.F_GETFL)
    print("\tflags:", hex(flags))
    for attr in dir(os):
        if attr.startswith('O_'):
            flag = getattr(os, attr)
            if flags & flag:
                print("\tflag {} {:x}: {:x}".format(attr, flag, flags & flag))

and then added this __init__ to TeamcityTestRunner:

    def __init__(self, *args, **kwargs):
        super(TeamcityTestRunner, self).__init__(*args, **kwargs)
        print("testrunner init")
        print_flags(self.stream)

I then this this output in the build log:

[14:49:52]
[Step 1/1] testrunner init
[14:49:52]
[Step 1/1] file: <unittest2.runner._WritelnDecorator object at 0x7f85d52c1710> 2
[14:49:52]
[Step 1/1]  flags: 0x801
[14:49:52]
[Step 1/1]  flag O_NDELAY 800: 800
[14:49:52]
[Step 1/1]  flag O_NONBLOCK 800: 800
[14:49:52]
[Step 1/1]  flag O_WRONLY 1: 1
morganwahl commented 5 years ago

I think the correct fix for this is to catch the exception and retry. I'll see if I can PR that change, and/or a regression test.

shalupov commented 5 years ago

Not sure. Do you know why it's in NONBLOCK mode? If it is, write call maybe also throw an exception. I don't think the right way is to decorate all of them with a retry.

morganwahl commented 5 years ago

I have no clue why it's non-blocking. Maybe because a parent process made it that way? Or maybe it's something in unittest itself?

Another approach would be to change it to blocking, somehow.

morganwahl commented 5 years ago

If it helps, this is run in a build config step that executes a single command. That command is a shell script we've written, which in turn invokes the Django management command to run the tests.

morganwahl commented 5 years ago

[closed on accident]