LitKnd / littlekendracomments

1 stars 0 forks source link

Post: MAXDOP of Confusion (Dear SQL DBA Episode 8) #9

Open LitKnd opened 3 years ago

LitKnd commented 3 years ago

Comment history from Wordpress

Rob July 14, 2016 11:24 am Thanks Kendra, it is always good to hear this subject as frequently as possible and especially how it relates to the newer versions of Sql.

Loading... Reply T Alben July 14, 2016 1:57 pm Thank you for this post. Very nice. Reads like a fairy tail. Once upon a time there was a Numa node, and it had four logical nodes. Their names were …

Loading... Reply Kendra Little July 14, 2016 2:17 pm LOL! I love it 🙂

Loading... Reply James July 15, 2016 8:14 am Hello Kendra,

You did a great job explaining what the max dop should be. Would you recommend the the max dop calculator https://blogs.msdn.microsoft.com/sqlsakthi/p/maxdop-calculator-sqlserver/

I have 24 CPU’s 2 NUMA Nodes 6 number of processors per core

And it recommends 3 as my MAX DOP, I feel that is a low number

Loading... Reply Kendra Little July 15, 2016 10:32 am This is a fun one! I would personally tend to choose between 6, 4, and 2 based on the info you have– but to decide I’d look at what the workload type is.

Is it a data warehouse workload with mostly reports running? If so, I’d be biased to starting around 6.

Is it an OLTP workload or a mixed workload? If so, I’d look at the batch requests/sec, and start more at 4 or at 2.

Before changing the setting, baselining wait stats as well as batch requests /sec over a few days is helpful. If you go to a higher maxdop, you don’t want to see waits like SOS_SCHEDULER_YIELD get worse, indicating that you’ve got contention and that queries are spending a lot of time waiting to get access to a CPU.

As for why I wouldn’t consider 3, I’m a little old school, and have been biased against odd numbers for years. Linchi Shea wrote a post a while back showing some tests he ran where maxdop 5 was slower than maxdop 4 with an example workload, but at this point the hardware and SQL Server setup aren’t at all current (and I’m not finding a link to the post anyway). So I wonder if my old bias has any grounding in fact anymore at all. It would be an interesting thing to study, it’s on my long term list of things I’d like to experiment with given the right hardware, workload, and available time 🙂

Loading... Reply Parallelism Configuration Options – Curated SQL July 18, 2016 5:10 am […] Kendra Little discusses max degree of parallelism and cost threshold for parallelism: […]

Loading... Reply Why an Upgrade can Cause Performance Regressions (Dear SQL DBA Episode 9) - by Kendra Little July 21, 2016 8:00 am […] For more information on how to determine these two settings, check out my last episde, “Max Degree of Confusion.” […]

Loading... Reply Rudyx July 26, 2016 12:43 pm Been following the rules since SQL 2000 – as for cost 30 seems to work well too. This article seems a good dove-tail into tempdb file allocation as well too!

Loading... Reply Kiran August 27, 2016 8:07 pm Hello Kendra,

Really Good Post. Below post is really helpful too, http://sqlknowitall.com/determining-a-setting-for-cost-threshold-for-parallelism/

We have Datawarehouse server with below Configuration,

Intel Xeon 2.40Ghz 8 Processors 160 Logical Processors 8 NUMA node 20 Logical processors per NUMA node 2TB RAM

Current setting :- MAXDOP :- 8 and Cost Threshold of Parallelism = 20.

What is your suggestion?

Also, we have 80 instances of Mix workload with below Configuration,

Intel Xeon 2.13Ghz (2 Processors) 32 Logical Processors 2 NUMA Node 16 Logical processor per NUMA node 512GB RAM

Current setting :- MAXDOP = 0 and Cost thershold of parallelism = 50

What is your suggestion?

Thanks in Advance.

-Kiran

Loading... Reply Kendra Little August 30, 2016 10:50 am Hey there,

The 8 socket server is definitely a special beast that will require custom tuning and testing on that server itself. That one’s not guess-able– different hardware has different speeds at doing cross-socket memory access, so it’s really going to vary by your query workload AND your server hardware.

For your mixed workload servers, the basic recommendation would be to start at maxdop 8 and tune from there on each instance.

Loading... Reply kiran August 31, 2016 5:17 pm Thanks Kendra for your suggestion.

Really appreciated.

One more question, in which scenario MAXDOP set to 1?

Loading... Reply Kendra Little September 5, 2016 9:18 am Setting maxdop to 1 at the server level disables parallelism by default. Queries can only go parallel if they use a hint to set the degree of parallelism to multiple cores, and otherwise qualify (estimated cost is over the server cost threshold setting, etc).

Parallelism is typically helpful to expensive queries, so you generally only want to disable it when the application has been specifically designed as a single threaded application and is always tested single threaded. The most famous example of an application designed this way is SharePoint.

Loading... Reply Adrian September 10, 2016 8:57 am Hi Kendra,

Thank you for this post, very useful.

Would the same rules apply to a Virtual Machine?

I have a SQL Server 2005 VM running on Hyper-V with the following spec. 8 vCPUs 128GB RAM 3 NUMA nodes.

Each Hyper-V host has the following spec: 4 CPUs using Hyper threading with 8 cores in each. 256GB RAM.

What would you recommend setting MAXDOP to? It is a purely OLTP workload.

We are already spanning multiple NUMA nodes due to the amount of RAM. We can’t increase the number of vCPUs due to it being at the maximum for the OS. I did think about slightly reducing the amount of RAM used by the VM to see if it would use 2 NUMA nodes instead of 3.

Loading... Reply Kendra Little September 19, 2016 9:32 am In general, you want the SQL Server NUMA setup to “align” / fit within the host server NUMA nodes to avoid foreign memory access.

Hyper-v does have some counters that let you see how well your VM aligns with the host and how much foreign memory access is happening: https://technet.microsoft.com/en-us/library/dn282282(v=ws.11).aspx#BKMK_VM_Settings

I gotta be honest, though — a three NUMA node setup with 8 cores which would be split into groups of different sized cores (3, 3, and 2? Not sure) on an unsupported version of SQL Server isn’t something that I’d try to tune for performance.

If performance is important, I’d work on getting to the latest and greatest versions of SQL Server and the hyper-visor as priority 1, and then work on applying best practices from that point down. Both Hyper-V and SQL Server’s NUMA performance have evolved over time, and documentation and options for the most recent stuff is better now.

Loading... Reply Scott September 14, 2016 8:19 am Hi Kendra,

So how would you apply this towards virtual SQL servers where CPU sockets is rather abstract?

Loading... Reply Kendra Little September 19, 2016 8:41 am Hi Scott,

The thing I remember is that the CPU sockets are not abstract to the SQL Server. In a VM, it sees the virtual sockets presented to it and it goes on those. Hypervisor manufacturers are aware of this– for example, VMWare has a best practice guide for SQL Server and discusses how to configure the sockets and NUMA within it: http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf

It does require VERY careful reading though because of licensing concerns with SQL Server! One of the keys that it discusses is alignment- “If your workload requires more than one core per socket for licensing purposes, or for any other reason, make sure it is aligned with the underlying physical NUMA topology.”

In other words, the virtual NUMA nodes you define should fit within / align within the NUMA nodes of the host servers.

Loading... Reply Brian November 3, 2016 8:14 am Hi Kendra,

I’m trying to help out a friend that currently has a SQL 2008 R2 VM running on a VMware ESXi 5.0 host. The host is an older box that has 2 CPU’s with 4 cores each. No hyper threading available.

The SQL VM is currently configured with 3 vCPU’s (3 separate sockets, 1 core each). What would be the recommended MAXDOP?

Any insight would be appreciated, I think the odd number of vCPU’s is throwing me off.

Thanks!

Loading... Reply Kendra Little November 3, 2016 8:55 am It’s pretty uncommon to give SQL Server an odd number of CPUs, like 3. You’re allowed to do it, but you won’t find many rules of thumbs for configuring maxdop for that specifically. And you’ll find that a lot of us DBAs are a bit wary of it because of old issues like this one, where SQL Server wouldn’t even install if you had an odd number of CPUs: https://support.microsoft.com/en-us/kb/954835

So if it was me, I’d be happier with 4 vCPUs and would consider maxdop at 4, 2, or 1, depending on workload.

Loading... Reply Pedro Oliveira November 18, 2016 7:23 am Hi Kendra,

Great video… but I’d like to make it more complex, if I may, with hyper-threading…. I’ve a CPU with 8 physical cores and HT (16 logical cores).. SQL NUMA displays 16 logical cores and 16 scheduler count… I’ve this set to MAXDOP to 6 since I’ve lots of batches per second.. In this case there’s not much to say.. But I also have one with 4 cores, 8 logical… If I set MAXDOP to 8 or less, say 6, won’t SQL Server use HT cores that are much slower than physical ones?! If a query takes 60secs single threaded and SQL splits into 2 threads, one physical and one HT (being HT 30% of physical – I think), one core will take 30secs but the HT core will take 100secs… It’s slower… Shouldn’t MAXDOP be the maximum physical cores and not logical cores?!

Thanks, Pedro

Loading... Reply Kendra Little November 18, 2016 2:31 pm I think you’ve got some very old numbers from somewhere — hyperthreading on modern processors doesn’t make one thread 70% slower than the other. I have had a couple of clients who had read similar older information who had two-node failover clusters. When they set hyperthreading ‘on’ for one node and ‘off’ for the other, and their workloads performed as well or better on the nodes with hyperthreading. (But of course I haven’t tested every processor or every workload.)

If you don’t want to use hyperthreading, the way to do that is to turn it off at the processor level / BIOS settings. SQL Server will use the logical processors you give it.

Loading... Reply Udhayaganesh November 28, 2016 9:05 am Hi please guide me . I have configured maximum degree of. Parallelism is 8. For below configured.

Dual processor Virtual machine 12 CPU.

Thanks Udhayaganesh

Loading... Reply Kendra Little November 28, 2016 11:59 am You’re in the right place– read this post!

Loading... Reply Greg A. January 5, 2017 5:37 am Hi Kendra,

I really enjoyed this article and it, and the links you included, helped clear up a lot of things for me.

I would just like to add that Soft NUMA will not be engaged if Hyper Threading is enabled and brings the logical processor count over 8. They are not considered physical cores. An 8 core processor with HT will present 16 logical processors but the system knows it’s only 8.

If some out there have looked in their logs for the Auto Soft NUMA messages and can’t see them they may want to check if HT has bumped up their count.

Loading... Reply Kendra Little January 5, 2017 12:58 pm Great point, thank you! I updated the paragraph on auto soft NUMA to this: “When Automatic Soft NUMA is enabled, if you have more than 8 physical processors in a NUMA node, Soft NUMA will be configured when SQL Server starts up. If you’re running SQL Server in a VM, note that the hypervisor generally presents all virtual cores to the guest as physical cores– whether or not you have hyperthreading enabled on the host server– so this will kick in if you have more than 8 vCPUs.”

I can’t think of a hypervisor that currently exposes any information about hyperthreading to a guest. But I wouldn’t be surprised to be wrong, either, that seems like something that might change.

Loading... Reply BeGrateful January 5, 2017 2:33 pm Hi Kendra, Thank you for your clear and detail explanation. I really love your style of teaching. We currently have a DB server on 2016 and 1 cpu socket with 4 core + hyperthreading. This server is for a OLTP workload. MS recommended that we set our MaxDOP to 8, i was going toward setting MaxDOP to 4 instead. Also Cost Threshold for Parallelism set default to 5, i am planning to set it to 50. What would you recommend?

Also, in term of getting a baseline or benchmark, what would be your approach on this.

Loading... Reply Kendra Little January 6, 2017 7:34 am Thanks for the kind words! This is a great question, and I will write a whole post on it, but here are some notes:

I am pretty comfortable starting with cost threshold set to 50 and then tweaking from there, probably because that was the standard my team used way back in 2005 and it worked well for our queries then. The main thing is that I’d change that at a different time than changing the maxdop settings, in case you have some important queries that get slower from the cost threshold change.

For maxdop on an existing server, if I want to lower it, generally I’ll do it somewhat gradually– say go to 6, let it run for a week, then 4.

For a benchmark, you’ve got a few options. You can use replay, but it’s time consuming to set up and you want to run the replay on an identical hardware setup, which not everyone has. Great if you can dedicate the resources and get it to work.

If you don’t have replay, before starting I’d identify top queries over a week (ideally longer) in a few categories: by execution count (most frequent queries), by average duration (longest queries), by total CPU usage (most processor consuming), by total reads (biggest IO consumers). You can use these as a benchmark — if they are parameterized queries, though, and they usually are, you have to do extra legwork to figure out what parameters to execute them with, plus what harness to use. Silver lining is that this set of queries becomes incredibly useful for further changes, but it’s a good amount of work.

If you’re not benchmarking, you at minimum want to identify these queries so you can observe their performance after you make each change.

Options for identify/ observing top queries are execution plan cache (may miss things, but fast – free scripts out there on the internet), monitoring software that catches query duration, and/or Query Store (2016+).

Hope this helps!

Loading... Reply Michael Villegas January 27, 2017 11:29 am Hey Kencra, Great post, excellent advise, it is really helpful. I have one question though, the default value of the “max degree of parallelism” is 0 (zero), how bad is to leave it as is, I know that the KB recomends some nombers based on the CPU specs, but if it remains as 0, would it be harmful for the overall server performance?

Loading... Reply Kendra Little January 27, 2017 11:34 am 0 means “use all the cores”. I have seen quite a few cases where that setting was slowing things down, and queries measurably sped up quite a bit when the setting was changed.

That’s not true in all cases, though. In other cases, changing the setting doesn’t make a huge difference. It varies by the hardware and the workload.

Loading... Reply Rafael Placido February 27, 2017 10:25 pm Hello Kendra

I love your blog, I have a question about the Automatic Soft NUMA I read the article I found a little bit tricky about how they noticed the increase of performance in the sql server, how they got that metrics I want to the same with my SQL Server 2014 but I not sure how to make a metric about how is the performance in this momement before make the change, even If I have a bad configuration that the vendor recommended.

I give a little picture about my server I have 4 numa nodes with 16 cpu each one and the max degree of parelelism is 16 (big mistake but somebody recommeded like that and the client want ot that way, I think should be 8 like in the kb article told about it) with cost threoldhold with 80. I not sure how to make an initial metric with the queries or with another dmv?

Loading... Reply Kendra Little February 28, 2017 8:03 am The only way to know how a workload will perform under a new maxdop is to test the whole workload. You can make some guesses based on wait statistics ahead of time, but you won’t know.

You may be able to pick out the longest/ most important queries from dynamic management views and test those under a different maxdop using query hints, but it’s going to be tough with queries that make modifications. And you still won’t know how a global settings change will impact all the other queries in the environment.

When performance is important, teams have identical or near-identical hardware to test against outside of production, and load tests representative of their workload. One option is to use the replay features for this in SQL Server, just know that there is fairly significant time to get it set up and working and plan accordingly.

Loading... Reply Matt April 13, 2017 6:59 am Kendra, We are discussing MAXDOP settings and the differences between Standard and Enterprise Editions. Do you know what the limitations are for Standard Editions in 2014\2016 when it comes to configuration? We’ve heard different answers on this to date, even from MS.

Loading... Reply Kendra Little April 13, 2017 8:47 am There’s pretty straightforward documentation on the limits – just check out the documentation on “Compute Capacity Limits” for the version of SQL Server you’re looking for. Here’s the one for SQL Server 2016: https://docs.microsoft.com/en-us/sql/sql-server/compute-capacity-limits-by-edition-of-sql-server

If you’re getting different answers, the issue may be with terminology. The limits on compute power aren’t limits on maxdop per se– they’re limits on sockets/cores. Maxdop is a configuration setting applied to the SQL Server and isn’t the same as a “compute limit”, so if you’re asking about maxdop limits then I can see how you might get confusing answers.

Does that help?

Loading... Reply dan May 5, 2017 11:03 am what’s the maxdop should be if i have 2 numaNodes and online_scheduler_count 3 each (total 6 logical processors)? Thanks.

Loading... Reply Kendra Little May 7, 2017 2:57 pm The guidance in KB 2806535 is basically this:

Figure out how many logical processors you have in a NUMA node If you have 8 or more logical processors in a NUMA node, generally you’ll get the best performance at maxdop 8 or lower If you have less than 8 logical processors per NUMA node, generally you’ll get the best performance setting maxdop to the number of logical processors or lower

Loading... Reply Stupid Defaults – nate_the_dba June 17, 2017 11:58 am […] alike. But don’t ask me — there are plenty of people much smarter than I with blog posts on the topic. If you put a gun to my head, for CTFP, I’d say “pick your […]

Loading... Reply Shelly August 18, 2017 10:43 am Will there be changes in parameters if having multiple instances on server? We have 3 instances on 2 (-14 cores) physical CPUs – 56 logical – CPU . All instances are equally busy with multiple OLTP databases . Max DOP 8 and Cost Threshold for Parallelism of 50 can be set to all 3 instances? Or with multiple instances values should be lower? Thanks,

Loading... Reply Kendra Little August 18, 2017 10:50 am I don’t recommend using multiple instances per server (also known as “instance stacking”). One of the many reasons it’s not a good practice is that it makes configuration of the instances very difficult. There’s no way for me to answer your question — in practice it will vary too much depending on the workloads of the instances and their uses. If you run into performance difficulties with CPU usage, tracking down what each instance was doing and how they interacted is extremely difficult.

Using virtualization and separating the instances out is more flexible. Although troubleshooting still has complexity, it’s not nearly as difficult since you can also use tools provided by the hypervisor.

Loading... Reply Shelly August 18, 2017 11:23 am Thanks for the quick reply. Appreciate it. This solution was chosen over Virtualizing databases servers, but needed to isolate the databases for performance by allocating memory and for separate groups. The overall CPU usage of Server is avg 30% . So really wanted to know how these Parallelism settings should be set for multiple instances. Thanks.

Loading... Reply Steve Kirchner August 21, 2017 7:12 am What if you are utilizing SQL Server 2012 on a virtual machine with 2 virtual CPU – NUMA = 1? in this case do 0 and 2 have the same result (use all cores)? Not really left with any good options, either set to 1 and turn off or set to 0 or 2 and turn all on?

Loading... Reply Kendra Little August 21, 2017 8:05 am Yes, that’s correct.

Loading... Reply Tarek September 20, 2017 2:20 am hi i have one server with one physical cpu and 24 cores ..only one numa node it is a small erp database around 5 GB should the maxdop be 8 with 30 cost ? or this only related to multiple cpus (not cores )

Loading... Reply Kendra Little September 20, 2017 8:22 am The formula still applies with one NUMA node — Microsoft guidance is to start with maxdop 8 and test from there.

Loading... Reply Colin Smith November 6, 2017 4:50 am Hi, love your articles 🙂

I have a case where a server has 8 single core CPUs. I think given the guidelines that MAX DOP should in this case be set to 1 (number of cores on a single CPU). However, I have some doubts whether turning off parallelism is a good idea.

What would you advise in such a scenario? If MAX DOP is to be set to 1, I assume the cost threshold for parallelism is then irrelevant?

Thanks.

Loading... Reply Kendra Little November 6, 2017 12:55 pm Hey Colin,

The guidance for 8 cores is that your best number is likely 8 or lower. Definitely would not start testing at maxdop 1 unless there is specific advice from the app developer that it was designed and tested for single threaded SQL Server execution (aka sharepoint).

Edit: sorry, I misread. Is there a reason that you have 8 x 1 cores? I’m guessing this is virtual. If so, which hypervisor are you using and what version is it?

kl

Loading... Reply Alex Whyte January 7, 2018 2:22 am Kendra, thanks for your wonderful podcasts – I really enjoy them.

It’s worth mentioning that changing the MAXDOP or cost threshold will clear the plan cache.

Loading...

Reply Kenyon (@sqlrookie711) July 19, 2018 6:26 am Awesome way of explaining such a confusing yet important piece of SQL server! I’m a new DBA and love the way you break things down so that a newbie like myself could understand!

Loading... Reply Martin Bansey August 21, 2018 10:43 pm Hi Kendra,

If an instance has MAXDOP set at 1 and uses query hints to allow specific queries to go parallel, is the Cost Threshold For Parallelism value still used by SQL to decide whether or not to actually go parallel?

I haven’t been able to dig up this specific information although this link (https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-cost-threshold-for-parallelism-server-configuration-option?view=sql-server-2017) suggests that CTFP is ignored if MAXDOP is 1. This makes sense without query hints as no request, regardless of cost, will go parallel when MAXDOP is 1.

Thanks, Martin

Loading...

Reply Kendra Little August 22, 2018 12:49 pm Hi Martin,

If instance level maxdop is set to 1 and you use a query hint to set maxdop = 4 (just to pick a number), the maxdop hint will override the instance level setting, but you are not at all guaranteed parallelism.

The cost threshold for parallelism does come into play- that threshold is compared with the estimated cost of your query running serially (that is how the threshold works regardless of the instance level setting).

There are other parallelism inhibitors which may disqualify the query from going parallel, even if it passes the cost threshold.

Hope this helps! Kendra

Loading...

Reply Max Degree of Parallelism vs Cost Threshold for Parallelism – Dave Bland August 30, 2018 3:50 pm […] MAXDOP of Confusion (Dear SQL DBA Episode 8) […]

Loading... Reply Dicas de Tuning – Como o Paralelismo pode afetar o CPU Time? – Luiz Lima September 25, 2019 9:12 pm […] https://littlekendra.com/2016/07/14/max-degree-of-parallelism-cost-threshold-for-parallelism/ […]

Loading... Reply Nela October 21, 2019 6:16 am Hi Kendra,

Your post is great, thanks so much for it! I do have a specific question also, since I am a bit confused by this setup.

I have a physical server running Windows Server 2016 Core and SQL Server 2016 SP2 CU 10, Standard edition installed. This server has 2 sockets, 16 cores per socket, 36 logical processors, 72 total logical processors. Since this is a standard edition, my SQL Log also tells me 48 logical processors are being used based on SQL Server licensing (to keep the story short, I will skip WHY this server has that processor configuration, although it was not in the recommendation).

As I am using SQL 2016, I get soft-NUMA enabled by default and I can see in the SQL Log 8 NUMA nodes. This server will be running an OLTP workload. What would be the recommended value to start with for MAXDOP? I can see 72 schedulers when I query sys.dm_os_schedulers, but since SQL Server can only use 48 logical processors, I am unsure what value to set for MAXDOP. 6, as I have 8 soft-NUMA nodes and 48 logical processors available to SQL? Is it necessary that I do some extra configuration on the affinity masks in this specific setup?

Your input would be greatly appreciated.

Thanks a lot!