Type header optimization

cbaxter / InfinityMQ

A .NET Message Transport on Steroids

GNU Lesser General Public License v3.0

5 stars 1 forks source link

Type header optimization #16

Open jgoz opened 12 years ago

jgoz commented 12 years ago

The current Type header format in MessageWriter should be profiled and improved if possible/necessary.

The performance impact of using a versionless AQN may not matter with large messages, but it could become an issue for small messages, especially if a compact binary serializer is used (i.e., protobuf).

Required/desirable properties for Type encoding:

Deterministic and consistent, even across process & machine boundaries
Filterable - should be in such a format that a subscriber could easily ignore messages based on the type header without a lot of computation
Small and fast

jgoz commented 12 years ago

After some thought, the current implementation is probably sufficient for a general-case baseline given that the encoded type names are being cached.

What we may want to do is make this functionality pluggable so that client applications can use their own Type header format. For maximum performance (and with considerable effort), clients could specify type IDs at compile time and potentially reduce the type header to a constant 4 bytes instead of 50-80 bytes.

CaptainCodeman commented 12 years ago

Yes, we were aiming for an easy out-of-the-box experience with no configuration required but allow people to override things when optimizing.

BTW: I think the FullName would be a good compromise instead of the AssemblyQualifiedName. Shorter and avoids some potential assembly-versioning issues (or creates different ones!).

CaptainCodeman commented 12 years ago

Oops, didn't read the full code :)

I know I'm just benchmarking in my head but in my experience doing direct string manipulation (e.g. split, join) will be much faster than using a Regex (even if it were compiled).

I'll do some proper testing and update it tonight if it's significant

jgoz commented 12 years ago

You're probably right about Regex being slower, but this will only happen if the message type hasn't been seen before. Hopefully, client apps will not have millions of message types...

CaptainCodeman commented 12 years ago

Ok, I really, really need to read the code don't I?! :)

cbaxter commented 12 years ago

The reason that the AssemblyQualifiedName is being used is to ensure that when Type.GetType(name) is called it can correctly locate the type in whatever assembly it may exist. The Regex is being used to strip out the version information. We could arguably drop the PublicKeyToken etc, but at the very least we need the assembly name I think?

The 'Type' frame is cached for reuse, so a given type header will only ever exist once and should not need to be rebuilt. The receiving end also caches the Type once found so it never needs to be looked up again.

I would not expect to have millions of message types, heck hundreds of message types is highly unlikely.

cbaxter commented 12 years ago

As for making the Type header format pluggable, that is a good idea; and would tie in issue #9 of having 'keyed' headers to identify the key/value pair.