HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.52k stars 584 forks source link

slowness in strings.charmap() causes "extremely slow" failure #1153

Closed heyitsvic closed 5 years ago

heyitsvic commented 6 years ago

I don't understand yet what causes this slowness, but shouldn't slowness in the initialization of hypothesis itself be excluded from timing of my strategy? When this happens, I get this failure:

  File "hypothesis/internal/conjecture/engine.py", line 479, in run
    self._run()
  File "hypothesis/internal/conjecture/engine.py", line 875, in _run
    self.generate_new_examples()
  File "hypothesis/internal/conjecture/engine.py", line 788, in generate_new_examples
    self.test_function(self.last_data)
  File "hypothesis/internal/conjecture/engine.py", line 288, in test_function
    self.record_for_health_check(data)
  File "hypothesis/internal/conjecture/engine.py", line 354, in record_for_health_check
    state.overrun_examples), HealthCheck.too_slow,)
  File "hypothesis/internal/healthcheck.py", line 38, in fail_health_check
    raise FailedHealthCheck(message, label)
FailedHealthCheck: Data generation is extremely slow: Only produced 3 valid examples in 1.10 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.
----------------------------------------------------------------------
Ran 3 tests in 2.448s

FAILED (errors=1)
Trying example: testFieldFilters(self=<MyTest testMethod=testFieldFilters>, field_filter=field {
  field_path: "a"
}
op: EQUAL
value {
  boolean_value: false
})
Trying example: testFieldFilters(self=<MyTest testMethod=testFieldFilters>, field_filter=field {
  field_path: "h0u"
}
...
Trying example: testFieldFilters(self=<MyTest testMethod=testFieldFilters>, field_filter=field {
  field_path: "G.dx"
}
op: EQUAL
value {
  string_value: "\n\362\276\204\261\031\"\002\362\236\217\266\364\205\265\200)-\361\233\227\202"
})
You can add @seed(63033071861305401262629909982402355138) to this test to reproduce this failure.
-

See the profile here:

https://gist.github.com/heyitsvic/8baf5410b23b16f4679766c8b1657470

Zac-HD commented 6 years ago

Hmm. Slow initialisation is excluded from that measurement, so I think the problem really is that drawing an example is very slow.

Looking at the profile (very very helpful, thanks!), it's spending a lot of time in proto_strategies in my_strategies.py. Can you share this file and a test that raises this error?

heyitsvic commented 6 years ago

@Zac-HD The initialization is done lazily, so it shows up below my code in the stack, but it's in fact initialization of hypothesis. Note that the slow code is contained in hypothesis.internal.charmap.charmap, which does lazy, memoized initialization from (what appears to be) a gzipped pickle of unicodedata.

I'm unable to reproduce the slowdown except on some corporate hardware, but if you put a sleep 10 in that function (above) you'll see the same behavior.

Zac-HD commented 6 years ago

I still can't reproduce this problem, but I suspect that forcing wrapped strategies to be initialised before we start the timer might help.

diff --git a/src/hypothesis/internal/conjecture/data.py b/src/hypothesis/internal/conjecture/data.py
index 361a73d4..1b6a8827 100644
--- a/src/hypothesis/internal/conjecture/data.py
+++ b/src/hypothesis/internal/conjecture/data.py
@@ -169,6 +169,10 @@ class ConjectureData(object):
         at_top_level = self.depth == 0
         if label is None:
             label = strategy.label
+        else:
+            # initialise wrapped strategy *before* timing draws.  See:
+            # https://github.com/HypothesisWorks/hypothesis-python/issues/1153
+            strategy.label
         self.start_example(label=label)
         try:
             if not at_top_level:

Can you try this and let me know if it helps?

Also: while it's not a solution, you can work around this problem by calling hypothesis.internal.charmap.charmap() in your conftest.py or other setup code, or st.text().example() if you don't want to touch internals. I'm not claiming that's a solution, but it would let you keep working while we work this out.

heyitsvic commented 6 years ago

Unfortunately, initializing wrapped strategies before starting the timer did not help.

I ended up implementing a strategy for generating text fields in the meantime, using primitive hypothesis strategies as a temporary work around.

bukzor commented 6 years ago

Is strategy.label meant to cause charmap to be called? How would that be?

Zac-HD commented 6 years ago

@bukzor - accessing the attribute forces reification of a LazyStrategy, which in turn means that __init__ of the underlying strategy (containing calls to charmap()) actually gets called. It's a complicated scheme, but allows errors in use of Hypothesis to show up as localised test failures rather than as an error collecting the module.

@heyitsvic - I really want to get to the bottom of this, but I just can't reproduce it. Can you share an example test which fails if you add a slowdown to charmap()? Here's a simple one that should have the same problem, but doesn't:

@given(characters(blacklist_categories=['Cs', 'Cc']))
def test_charmap_slowdown(t):
    pass

I can make this take as long as I like by inserting time.sleep calls in charmap, but I can't get any of the time-related health checks to fail :confused:)

As-is, all I know is that it's happening with a test you won't show me, using custom strategies you also won't show me, and I can't reproduce the problem. That suggests the problem is in your strategy, not Hypothesis internals (ref: #434), so unless we can make some progress I'm inclined to close the issue.

(note that if NDAs etc are required, @DRMacIver and I both offer consulting services for corporate users)

heyitsvic commented 6 years ago

The st.text().example() you suggested is a fine work around. I'm working on getting a repro back to you.

heyitsvic commented 6 years ago

Sorry for the delay!

I managed to reproduce the error by adding a time.sleep(2) in charmap's initialization. In the error message, we can see that the single generated example has a total draw time of 2.1x, suggesting that charmap startup time is being bundled in with the strategy's draw time.

The interesting thing is that the 'too slow' health check seems to only be active after a certain number of draws.

import hypothesis
from hypothesis import strategies as st

draw_count = 0

@st.composite
def MyStrategy(draw):
  """The 'extremely slow' health check is only active on the 10th through 18th draws."""

  global draw_count
  print('DRAW!')
  draw_count += 1
  if draw_count == 12:
    result = draw(st.text())
  else:
    result = draw(st.integers())
  print('drew %r' % result)
  return result

@hypothesis.given(MyStrategy())
def slowdown(text):
  print('TEST!')
  del text

diff --git a"hypothesis/internal/charmap.py                                                       
--- a"hypothesis/internal/charmap.py                                                                                       
+++ b"hypothesis/internal/charmap.py                                                                                      
@@ -51,6 +51,9 @@ def charmap():                           
     # filesystems is fairly simple: check if loaded, else try loading,
     # else calculate and try writing the cache.
     if _charmap is None:                                              
+        print('SLEEPING...')                   
+        import time                                               
+        time.sleep(2)       
         f = charmap_file()                                         
         try:                                        
             with gzip.GzipFile(f, 'rb') as i:                                       
diff --git a"hypothesis/internal/conjecture/engine.py
--- a"hypothesis/internal/conjecture/engine.py                               
+++ b"hypothesis/internal/conjecture/engine.py                                                                                                
@@ -341,6 +341,7 @@ class ConjectureRunner(object):                                    
             ), HealthCheck.filter_too_much)                         

         draw_time = sum(state.draw_times)  
+        print("DRAW_TIME: ", draw_time)                                                                                                                                                                                                          

         if draw_time > 1.0:            
             fail_health_check(self.settings, (

Traceback (most recent call last):  
  File "slowdown.py", line 41, in main
    slowdown()
  File "slowdown.py", line 33, in slowdown
    def slowdown(text):
  File "hypothesis/core.py", line 863, in wrapped_test
    state.run()
  File "hypothesis/core.py", line 655, in run
    runner.run()
  File "hypothesis/internal/conjecture/engine.py", line 480, in run
    self._run()
  File "hypothesis/internal/conjecture/engine.py", line 876, in _run
    self.generate_new_examples()
  File "hypothesis/internal/conjecture/engine.py", line 789, in generate_new_examples
    self.test_function(self.last_data)
  File "hypothesis/internal/conjecture/engine.py", line 288, in test_function
    self.record_for_health_check(data)
  File "hypothesis/internal/conjecture/engine.py", line 355, in record_for_health_check
    state.overrun_examples), HealthCheck.too_slow,)
  File "hypothesis/internal/healthcheck.py", line 38, in fail_health_check
    raise FailedHealthCheck(message, label)
FailedHealthCheck: Data generation is extremely slow: Only produced 1 valid examples in 2.10 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.
Zac-HD commented 6 years ago

@DRMacIver - having investigated again, I think the problem here is somewhere in the interaction between cachable strategies and the recording of draw_times. I'm stuck, but maybe you'll have better luck?

@heyitsvic - I still can't reproduce your exact issue, but there's clearly something weird going on here. If you run with the --hypothesis-show-statistics option, what comes out?

DRMacIver commented 6 years ago

I think the problem here is somewhere in the interaction between cachable strategies and the recording of draw_times. I'm stuck, but maybe you'll have better luck?

I'm not sure there's anything mysterious going on. I think this is everything working as intended, it's just that the way it's currently intended to work is bad. 😞

I think it's become easier to hit this recently since my changes that integrated health checks into the core engine, but I'm pretty sure this has always been a problem that we were just able to mostly ignore before.

I think we might want/need to put in a hack (a hack? Us? No! We don't do that do we?) somewhere that globally forces the charmap to be loaded before the first @given is run.

I still can't reproduce your exact issue

Is it possible you're trying to reproduce the issue inside the Hypothesis test suite where we both mess with time and also pre-warm the cache?

That being said, I also can't reproduce the exact issue. The following variant reliably fails for me if I put a time.sleep(2) in charmap as @heyitsvic suggested and delete the .hypothesis directory before running it:


import hypothesis
from hypothesis import strategies as st

@st.composite
def MyStrategy(draw):
    if draw(st.booleans()):
        draw(st.text())

@hypothesis.given(MyStrategy())
def slowdown(text):
    pass

if __name__ == '__main__':
    slowdown()

The reason for the draw(st.booleans()) is that we initially draw the minimal example outside of the health checked region. If that minimal example were to draw from text this would cause us to avoid the problem, but in this case it doesn't because we only draw when the initial boolean is non-zero.

Zac-HD commented 6 years ago

I was definitely running outside our test suite, in a new project directory. (the same problem occurred to me :smile:)

I also stumbled across the minimal example not being health-checked. Is there a principled reason for this? Because it seems to me that if health checks would fail for the minimal example, we ought to fail right then - in fact I thought that this was part of the point (for eg large buffers)!

DRMacIver commented 6 years ago

Is there a principled reason for this? Because it seems to me that if health checks would fail for the minimal example, we ought to fail right then - in fact I thought that this was part of the point (for eg large buffers)!

No principled reason that I can recall. Moving the minimal example inside the scope of other health checks seems totally reasonable to me.

bukzor commented 6 years ago

@Zac-HD How come deletion? I was trying to gather opinion on maintainers' preferred solution, so that I can submit a patch that has a chance of being merged.

Zac-HD commented 6 years ago

Because we already pause the draw_times timer for this - the issue is a fairly subtle problem with health checks in the engine that I don't personally understand well and strongly recommend leaving to David.

If you're looking for a PR, can I suggest the bug label? :smile: In particular I gave up on 966, 961, and 944, but you've shown some great analysis before and might have a better chance.