Consider changing how we set the number of threads to use to run tests. [LUCENE-3667]

asfimport commented 12 years ago

The current way we set the number of threads to use is not expressive enough for some systems. My quad core with hyper threading is recognized as 8 CPUs - since I can only override the number of threads to use per core, 8 is as low as I can go. 8 threads can be problematic for me - just the amount of RAM used sometimes can toss me into heavy paging because I only have 8 GB of RAM - the heavy paging can cause my whole system to come to a crawl. Without hacking the build, I don't think I have a lot of workarounds.

I'd like to propose that switch from using threadsPerProcessor to threadCount. In some ways, it's not as nice, because it does not try to scale automatically per system. But that auto scaling is often not ideal (hyper threading, wanting to be able to do other work at the same time), so perhaps we just default to 1 or 2 threads and devs can override individually?

Migrated from LUCENE-3667 by Mark Miller (@markrmiller), updated Jan 02 2012 Attachments: LUCENE-3667.patch (versions: 5)

asfimport commented 12 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

-Dtests.sequential=true works good for me on restricted systems. But I agree I think we should have a setting that sets a upper bound no matter how many cpus are available.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

As far as the RAM usage, maybe solr tests shouldnt override lucene's default of 1 then.

  <property name="tests.threadspercpu" value="2"/>

But IIRC, the problem is that many of the solr tests sleep a lot, so higher parallelism is better...

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

As far as the RAM usage, maybe solr tests shouldnt override lucene's default of 1 then.

Probably they should not - but I override that down to 1 in my build.properties - like I said, that still sticks me with 8 threads. Lucene was lowered by default at some point, but I guess no one changed Solr. I'm not too worried about what I can change without hacking the build though.

-Dtests.sequential=true works good for me on restricted systems.

It doesn't work good for me - it takes many times longer than if I use 5 threads.

Its not a 'restricted' system - like a little under powered laptop - it's a fast quad core system that doesn't like our tests because its got 8 GB of RAM instead 16 (and I run a lot of RAM hungry apps) and because hyper threading causes the processor count to basically be a lie. I just want less than 8 threads because I have a sweet spot closer to 5 (I do hack my change into the build now) - more threads are not causing faster runs and they are occasionally bringing my comp to its knees during brief periods in the test run. I don't want one thread though - thats torturously slow - I just want full control of the number of threads to get below the 8 or more jail I live in.

8 threads will often bring my system to a crawl if I am trying to do something else, but 5 threads will run the tests in just about the same time for me, and it won't swamp me - but I cannot choose 5.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Its not a 'restricted' system - like a little under powered laptop - it's a fast quad core system that doesn't like our tests because its got 8 GB of RAM instead 16 (and I run a lot of RAM hungry apps) and because hyper threading causes the processor count to basically be a lie.

I have the same configuration on my desktop (quad-core HT 8GB ram), and its always happy.

On my mac though (dual-core HT 4GB ram) running the tests 'owns' the machine completely.

I figured its probably the i/o (my desktop has SSD)...

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

You didn't get 16GB? That's my big regret these days. I've almost upgraded because of this, but I'm too lazy and I dont want to figure out what I need to get or install it.

Like I said though, I run some other pretty RAM hungry programs at the same time as eclipse (which I also give a decent amount of RAM). If I shut some down, I don't go into heavy paging - but I'd rather just be able to lower the number of threads because that gets me the best of all the worlds - the tests are not any slower but I can leave my apps open.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I only have 8GB, but on that machine, the tests don't even seem to put a dent into the machine... like they are fast, I'm not even aware they are running.

On the 4GB laptop (without SSD), turning on the tests makes it sound like an airplane and i cant even read gmail.

So I think we should tune the tests to not require a 'beast' computer, they should run reasonably well on 'normal' computers too.

A few problems besides # of threads:

we have a few tests that take absurd times (TestIndexWriterDelete: 120seconds on my laptop). We should fix these bad apples so we arent relying upon multiple threads so much anyway. I think TestSort is absurd too.
I think ideally our tests should work without -Xmx512m, maybe we can get them all passing with 256m and keep simpletext/memory under control better.
We can always revisit tuning some of the more absurd params for mergefactor and stuff so this happens with less probability...

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

the start to a patch.

on my laptop lucene's "test-core" finishes consistently faster, in \~2:30 versus \~4:15.

additionally, solr tests actually pass (on this computer they would previously ALWAYS fail due to timeout issues).

finally, while tests are running I am able to still use my computer to read email and such, unlike before.

still some work to do on some super-slow tests that i haven't looked into.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch cleaning up some false fails and more bad apples: testing on my desktop i find that with less threads we have to be more cautious about bad apples that take like 30 seconds... still more work to do

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Because fewer threads = more likely collision of two bad apples and total time = max(thread time)? With history of executions and greedy balancing this shouldn't be the case anymore.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

here's my most recent patch... things are pretty good, but some bad apples in the solr tests (seems there is a lot of fsyncing going on thats slowing them down)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Because fewer threads = more likely collision of two bad apples and total time = max(thread time)? With history of executions and greedy balancing this shouldn't be the case anymore.

Well not just that, in general the bad apples should be fixed, we just sorta dodged the problem in some cases by cranking up parallelism.

I agree that better balancing will help (but its still dodging the root cause if a test sometimes takes like 30 seconds)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch, with a timeout on the ContentStreamTest so it doesnt hang forever if svn.apache.org is down, and fixing test fsync slowness issues by using LuceneTestCase.newDirectory() in solr tests (on local runs this uses ramdirectory 95% of the time, fsdirectory other times, in hudson it uses fsdirectory more often).

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

minor tweak: now times are much less crazy across different runs and much faster in the worst case.

In MockDirectoryWrapper:

    if (LuceneTestCase.rarely(randomState) || delegate instanceof NRTCachingDirectory) {
      // don't wear out our hardware so much in tests.
      delegate.sync(names);
    }

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I plan to commit this patch soon. if someone objects ill just pull it back, but it at least allows tests to pass on my laptop without timing out, etc.

asfimport commented 12 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

Some anecdotal numbers on my ThinkPad T400 ("Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz" with 8GB RAM running "Linux bester 2.6.31-23-generic #75-Ubuntu SMP Fri Mar 18 18:16:06 UTC 2011 x86_64 GNU/Linux")

"time ant clean compile test" on my laptop this morning, using trunk r1225376...

BUILD SUCCESSFUL
Total time: 32 minutes 27 seconds

real    32m28.495s
user    33m12.050s
sys 1m47.760s

...during that run, my CPU temp monitor warned several times (4 i think? it's a notification bar thing, i don't believe i have a log of it) that my CPUs were spiking into the 70-75C range.

same command after svn updating to r1225945 ....

BUILD SUCCESSFUL
Total time: 18 minutes 49 seconds

real    18m50.329s
user    23m44.440s
sys 1m1.080s

...and my CPU only spiked up to the \~70C range once.

+1 ... thanks rmuir.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Another anecdotal result, on a 4 core (8 "cores" w/ Hyperthreading) Intel Core i7-2600@3.4GHz, 8GB RAM, Windows 7:

time ant clean compile test on trunk @ r1225617:

BUILD SUCCESSFUL
Total time: 9 minutes 38 seconds

real    9m39.762s
user    0m0.091s
sys     0m0.090s

Same command after svn update'ing to r1225973:

BUILD SUCCESSFUL
Total time: 7 minutes 24 seconds

real    7m25.952s
user    0m0.060s
sys     0m0.138s

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Thanks for reporting back guys. I still dont like the timings hossman has (i think 19 minutes is crazy, would really love to know whats going on there).

but just for comparison here is my machines:

Linux (i7-2600k@3.4ghz, 8gb ram):

Before:

BUILD SUCCESSFUL
Total time: 7 minutes 2 seconds

real    7m3.099s
user    27m47.900s
sys 0m54.639s

After:

BUILD SUCCESSFUL
Total time: 4 minutes 51 seconds

real    4m52.310s
user    17m14.869s
sys 0m29.682s

Windows (Core2Quad-Q9650@3.0ghz, 8gb ram)

Before:

-Solr tests always timeout/fail-

After:

BUILD SUCCESSFUL
Total time: 8 minutes 37 seconds

real    8m39.302s
user    0m0.000s
sys     0m0.046s

Mac (Core i5@2.3ghz, 4gb ram)

Before:

-Solr tests always timeout/fail-

After:

BUILD SUCCESSFUL
Total time: 11 minutes 20 seconds

real    11m20.428s
user    28m0.921s
sys     1m38.629s

apache / lucene

Consider changing how we set the number of threads to use to run tests. [LUCENE-3667] #4741