OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.38k stars 1.12k forks source link

ConcurrentDictionary instances could be optimised for hardware #2198

Open orchardbot opened 12 years ago

orchardbot commented 12 years ago

@petehurst created: https://orchard.codeplex.com/workitem/18371

For consideration for Orchard Future Versions:

ConcurrentDictionary is used in a number of performance-critical core features to handle multi-threading. Notably in Caching, Content Manager, Shapes, WorkContextAccessor, Localization, Logging, View Engines ... and also in my current implementation of Alias.

This is good, but it could be better on some hardware. The constructor for ConcurrentDictionary takes two optional integers: concurrencyLevel and initialCapacity. Both of these could have an outcome on performance depending on a) the server's hardware (number of cores) and b) the intended usage of the dictionary.

Quoting from Microsoft documentation: http://msdn.microsoft.com/en-us/library/dd287191.aspx

        // The higher the concurrencyLevel, the higher the theoretical number of operations
        // that could be performed concurrently on the ConcurrentDictionary.  However, global
        // operations like resizing the dictionary take longer as the concurrencyLevel rises. 
        // For the purposes of this example, we'll compromise at numCores * 2.

This means that on servers with many cores we can get better performance by setting this value appropriately, and additionally setting a higher initial capacity will require less resizes and perhaps RAM can factor into that decision. Also from the Microsoft docs:

        // We know how many items we want to insert into the ConcurrentDictionary.
        // So set the initial capacity to some prime number above that, to ensure that
        // the ConcurrentDictionary does not need to be resized while initializing it.

Somewhere else I previously found a page with more detailed recommendations about these two values, but I can't find it just now, will have a look again another time.

My proposal:

We could have a core service, IConcurrentDictionaryFactory, that can be used to acquire a new ConcurrentDictionary with sensible defaults (e.g. numCores*2, and it could have a parameter for an estimated size which in some cases we'll know).

If for a specific hardware configuration the defaults didn't work out so well, the service could be overridden, or perhaps the defaults could be set from an advanced configuration admin menu.

This idea may only be relevant in extremely high-throughput situations and on specific server hardware and of course will need profiling; but it's quite easy to do, and I thought I'd just get it out here while I'm thinking about it.

orchardbot commented 12 years ago

@pszmyd commented:

+2;) Very good point! That is a rather simple change which could greatly improve performance in applications that use a lot of caching.

Btw, it'd be good to make use of Process.ProcessorAffinity (http://msdn.microsoft.com/en-us/library/system.diagnostics.process.processoraffinity.aspx) instead of Environment.ProcessorCount to set the initial concurrency level. Affinity can be smaller than the total number of cores.

orchardbot commented 12 years ago

@sebastienros commented:

"greatly improve performance in applications that use a lot of caching"

what do you mean ? any numbers ?

orchardbot commented 12 years ago

@pszmyd commented:

Just a hunch. Will try to implement this feature and provide you with some numbers. We have an internal Orchard-based application that does a lot of caching - it's a good thing to run measurements on.

orchardbot commented 12 years ago

@petehurst commented:

Wwhy would Microsoft write recommendations in their own documentation if it wasn't capable of having a significant impact on performance?

orchardbot commented 12 years ago

@bleroy commented:

I think Sébastien just means that like all perf fixes, you won't know it works until you measure it. Premature optimization, that sort of thing :)

orchardbot commented 12 years ago

@sebastienros commented:

Yes, significant is all relative. Even if it was 10 times faster, I don't think it could have much impact on the whole unit of work. Prove me wrong, please !

orchardbot commented 12 years ago

@petehurst commented:

I know ;) that's why I phrased it "capable of" not "definitely going to" ;)