adam-dot-cohen / HyperSerializer

Dot Net Binary Serializer - 17x faster than Protobuf and Messagepack, 2x faster than MemoryPack
Apache License 2.0
64 stars 5 forks source link

Failing to serialise an array of structs containing primitive values #4

Closed gcaplan closed 9 months ago

gcaplan commented 10 months ago

Hi

Keen to give this a try, but have hit a roadblock. I have a simple struct representing a trading tick. HS serialises and deserialises a single tick struct successfully. But when I try to serialise an array of ticks, it barfs with

"The type initializer for 'Hyper.HyperSerializer`1' threw an exception."

If this is a bug, a fix would be much appreciated. If I'm misunderstanding something please accept my apology and point out my mistake!

    public static class Scratchpad2
    {
        public struct SymbolTick
        {
            public long Timestamp;
            public double Bid;
            public double Ask;
            public int SymbolId;

            public SymbolTick(long timestamp, double bid, double ask, int symbolId)
            {
                Timestamp = timestamp;
                Bid = bid;
                Ask = ask;
                SymbolId = symbolId;
            }
        }

        internal static void Run()
        {
            int numRecords = 1000;

            var tick = new SymbolTick { Timestamp = 1694695915987, Bid = 1.23456, Ask = 1.12345, SymbolId = 12 };

            var singleSpan = HyperSerializer<SymbolTick>.Serialize(tick);
            var singleSpanOut = HyperSerializer<SymbolTick>.Deserialize(singleSpan);

            Console.WriteLine($"Single span: {singleSpanOut.Ask}");

            SymbolTick[] ticks = new SymbolTick[numRecords];
            for (int i = 0; i < numRecords; i++) { ticks[i] = tick; }

            // Crashing here
            var tickArray = HyperSerializer<SymbolTick[]>.Serialize(ticks).ToArray();
            var hsOut = HyperSerializer<SymbolTick[]>.Deserialize(tickArray);
        }
    }
adam-dot-cohen commented 10 months ago

Issue has been resolved. See the following test with your code above for reference...

Also, you can continue using HyperSerializer<T> as you were or the new non-generic convenience type that I added HyperSerializer (only have to specify type param on Deserialize.

I released a NuGet as well.

I'll wait to for you to confirm the fix before closing this out. LMK.

https://github.com/adam-dot-cohen/HyperSerializer/blob/01a7e9da7612b96fe6fbfa8569df604bcc8c8129/HyperSerializer.Test/SerializerTests.cs#L291

adam-dot-cohen commented 10 months ago

Note, the library only serializes fields on structs. If you happen to attempt to serialize a class, only properties with getters and setters will be serialized. Fields will be ignored.

In related news, if you are planning to invest substantial time into algo trading - let me know. HyperSerializer is a piece of larger hobby project that might save you some time.

gcaplan commented 10 months ago

Hi Adam

Sorry for the delay - I've been somewhat overcommitted the past few days...

I've just updated to the new release, but hit another issue. When I import the HyperSerializer namespace, I'm not seeing any of the Serialize or Deserialize methods. All I see is a namespace HyperSerializer.Dynamic. Have you changed any of the permissions? I'm new to .Net so may be doing something daft, but had no issues importing the old version.

Since I posted the issue I've found the package MemoryPack by Yoshifumi Kawai. This is a guy who has been working obsessively on serialisation for years now. MemoryPack is his latest offering and it has approaching 500 commits! I'm getting pretty good performance with it. Given that you're saying that HyperSerialiser is more of a personal experiment, do you think there's a realistic chance that it will significantly outperform Yoshifumi's package? If not, I'll step aside and leave you in peace.

Please advise.

On the wider issue of building trading apps, can I ask what you are up to?

My own project is rather specialised, so it's a bit unlikely there will be much overlap. I'll outline where I'm at and you can judge for yourself.

We are little group of siblings doing very nicely with manual trading. But we're retired and it's not a lifestyle we're enjoying, so I've dusted off my old coding chops (the first programme I wrote was on punchcards) and am hoping move into algo trading as a less stressful option.

My problems are:

  1. I'm interested in active day-trading strategies. This means I need to backtest at tick-level resolution, and none of the consumer-level platforms or OS offerings can do that properly at reasonable speed.

  2. I'm using price-based bars rather than time-based aggregations, and again, backtesting these has proven beyond the scope of the consumer platforms.

  3. We're specialising in FX, and most of what's available is focused on crypto and equities.

My first attempt at a fairly general platform was becoming too complex, so I drew in my horns. I'm focusing on trading FX on the Spotware platform. This offers no less than 3 APIs and decent trading conditions, plus 9 years of good quality free tick data. Narrowing the scope has unblocked the project and it's now going fairly smoothly.

We are emphatically not quants - I trade simple price patterns. So I'm focusing on event-based backtesting with very careful attention to slippage and costs. I'm a few weeks in, and so far I've:

  1. Written a bot to download and serialise the tick history from IC Markets. I have some billions of ticks on my local disk.

  2. Written a loader to read in the ticks and push them into the strategy manager. It can handle portfolio testing with multiple strategies and instruments, each with their own runtime parameters. Once cached, I can run an 8-year tick-resolution backtest on the 5 most liquid FX pairs in 55 secs, with all the ticks sorted by time for accurate market replay. This is good enough for my needs. The app is processing 15,000,000 ticks per second, before any serious optimisation - literally orders of magnitude faster than my tests of other platforms. Quite impressed with C# performance.

  3. Written modules to generate the bars and update the indicators.

  4. Written a pipeline of reusable modules to handle trade management - eg filters, signal generators, trade sizers, risk managers etc. These can be configured at runtime and plugged into the strategy in any sane combination.

  5. Developed a chart to display the bars, indicators and trades using the excellent AnyChart JS library. Legitimately free for personal use!

Still to do are plugins to simulate the account and trading engine of the broker venue. We only use a few features for our simple strategies, so this isn't too daunting.

I've found it quite a challenging project - a couple of major refactors before I settled on a workable design. Github is littered with abandoned trading platforms, and I'm beginning to understand why. The key has been to take YAGNI seriously!

I haven't open-sourced it because it's too quirky and specialised and under-documented. Plus I'm new to C# so doubtless much of the code is a bit agricultural. But if anything is of interest let's discuss. Equally, if you have anything that you think would help with this rather niche venture I'm be more than grateful to hear from you.

As Boswell once wrote, I'm sorry this is so long but I don't have the time to make it shorter! Thanks for your tolerance.

adam-dot-cohen commented 10 months ago

Use HyperSerializer.Serialize(array) and HyperSerializer.Deserialize<SymbolTick[]>(bytes) to serialize and deserialize.

Regarding MemoryPack vs HyperSerializer, see the link below for benchmarks. This test includes types with more strings that one would find in trading data. As a result, you can expect 2-3x increase in the performance difference between HyperSerializer and MemoryPack for your use case. I'll respond back on the rest of your post when I have time later this week...

https://github.com/pairbit/IT.Serialization/tree/main/IT.Serialization.Benchmarks

image

adam-dot-cohen commented 10 months ago

Sorry, the namespacing got changed by accident from Hyper to HyperSerializer. The issue has been resolved with the namespace moved back to Hyper. Please update to v1.3 on NuGet.

gcaplan commented 10 months ago

Everything working now - many thanks for fixing the bug! Much appreciated.

You're right - I am seeing a speedup on deserialize, which is a pleasant surprise given the hype around MemoryPack.

It's not quite as big as you predicted - but it's not far short of double the throughput.

Interestingly the two serialised files are exactly the same size, but it seems you're doing something different deep in the weeds. I guess I should see if I can understand your code - though as a line-of-business sort of guy this system-level stuff is a bit above my pay-grade.

My tick struct contains a DateTime, two doubles and an int. It seems that HS is optimal for this kind of numerical work.

This is quite a big deal for me, because deserialising north of a billion ticks per backtest is the major bottleneck. So I'm a very happy user!

Given the exceptional results I'm seeing you might want to promote this a bit to the community? MemoryPack has 2k stars and you are faster for the types that you cover, so clearly people would find it useful. But it's quite hard to find...

adam-dot-cohen commented 10 months ago

Glad to hear it worked out!

The smaller performance gap you're seeing is due to the fact that you're using stack based structs, which I suspect you're going to find problematic in the not so distant future for a variety of reasons (i.e. consider the size of the datasets you're working with and the number of copies of the data you're going to end up with as you work with the data - subsets, aggregates, display, etc.). If and when you decide to transition to classes, the performance gap will grow substantially (I've included the benchmark stats below for your type struct vs class).

I have a codebase that may suit your needs - but would need to discuss to figure it out. I'm a little further along than you with respect to operationalizing with respect to storage, live data pipes, etc. I won't have time the next couple of months to take it further so I'd be open to exploring joining forces if it's mutually beneficial.

Shoot me an email if you're interested in discussing - adamtocohen@gmail.com

1 MILLION ROUND TRIPS (SERIALIZE/DESERIALIZE)

------STRUCT------

Method iterations Mean Gen0 Allocated Alloc Ratio
HyperSerializer 1000000 13.64 ms 4000.0000 56000552 B 1.00
MemoryPack 1000000 20.43 ms 4000.0000 56000600 B 1.00
-----CLASS------- Method iterations Mean Gen0 Allocated Alloc Ratio
HyperSerializer 1000000 19.03 ms 8000.0000 104000552 B 1.00
MemoryPack 1000000 101.61 ms 8000.0000 104000600 B 1.00
gcaplan commented 10 months ago

Hmm - you may be right about the stack issue, but it's been OK to date.

I'm storing and loading month-by month, and am very unlikely to trade more than the 10 most liquid pairs FX pairs, so it's not as though I'm having to screen hundreds of equities.

The ticks are out of scope by the end of the month - I keep some stats but not the ticks themselves. It's only the bars that are stored for the duration and they are on the heap.

I did check that I had ample overhead. Even in a worst-case scenario I didn't hit an overflow.

If I do hit a problem it would only take a day or so to switch over, but I'll take your suggestion and test the performance implications of switching now.

I'll drop you an email about potential collaboration.

adam-dot-cohen commented 10 months ago

Sorry about the delay in responding to this, I've spent a lot of time on speed, both evaluating data provider latency, code infrastructure side (streaming via Microsoft Trill, blazing fast w/ storage via Microsoft FASTER, etc, etc, etc.). I'm also a ML quant so I can provide some guidance if needed. We'd have to speak to see if collaboration makes sense.

And thanks for the kind words re: HyperSerializer! Give the repo a start if it continues to meet your expectations!

Adam