AqlaSolutions / AqlaSerializer

Binary serializer with full .NET support!
http://www.aqla.net
Other
17 stars 3 forks source link

Very Long Deserialization Time on Few Systems #18

Closed ab-tools closed 7 years ago

ab-tools commented 7 years ago

Hello @AqlaSolutions again!

We are still using your great serialization library, just as a reminder, we are the ones have the bigger graph database as discussed some time ago here: https://github.com/AqlaSolutions/AqlaSerializer/issues/1

Everything really works well with a database file size of about 100 MB and a deserialization time of about 3-5 seconds on most modern computers which is fine.

But since with changed our product to use 64 bit if available (compiled with "Any CPU"), we do have few customers - all running Windows 7 64 bit - that complain about extremely long database loading times: Instead of 3-5 seconds it takes about 5-10 minutes for them!

For testing we then forced our product to start in 32 bit mode (via "CorFlags.exe") and the database loads "normally" fast again in just 3-5 seconds, but when switching back to 64 bit mode it gets immediately extremely slow again.

We cannot reproduce this problem ourselves, also not on a Windows 7 64 bit system. So we are running a bit out of ideas what could really cause this issue and how to track that down.

Do you have any idea what could cause such an extreme performance drop when deserialization in a 64 bit process instead of a 32 bit process?

Thanks in advance for any ideas on that Andreas

AqlaSolutions commented 7 years ago

Hi, the only thing I can suspect is BufferPool.cs related to byte[] allocation:

        // try growing, else match
        int newLength = (Is32Bit && toFitAtLeastBytes > 1024 * 1024 * 50)
            // in 32bit when > 50mb grow only linearly
            ? (toFitAtLeastBytes + 1024 * 1024 * 10)
            // in 64bit grow twice and add 10 kb extra if not enough
            : Math.Max(buffer.Length * 2, toFitAtLeastBytes + 1024 * 10);

Please use a profiler.

ab-tools commented 7 years ago

First thanks a lot for your extremely quick reply! :-)

Good point, for testing I could just change that line to always use the "32 bit" way and see if that helps, right?

That's easier as a first test than using a profiler on the customer machine. ;-)

ab-tools commented 7 years ago

I just send a changed version (Is32Bit forced to true) to one of the customers who sees this issue: I will let you know once I got a feedback if this change made any difference.

Really annoying if you cannot reproduce a problem yourself and need to wait for customer feedback...

Thanks again so far! Andreas

ab-tools commented 7 years ago

Just got feedback from the customer: Unfortunately the small code change made no difference. Still 10-15 min. database loading time while in 64 bit mode.

Do you have an idea how to profile that easily on the customer computer? Any tool you could recommend here?

Thanks again a lot for your support Andreas

AqlaSolutions commented 7 years ago

I prefer to use JetBrains dotTrace, they have a trial version so you should be able to run it there. Or may be even connect remotely. Good luck!

ab-tools commented 7 years ago

Thanks for the suggestion!

I will try to get on the client computer via TeamViewer or so to collect some data with dotTrace...

ab-tools commented 7 years ago

Just had the chance to access the customer PC via TeamViewer and recording some dotTrace samples here: https://www.fs-flightcontrol.com/download/temp/FS-FlightControl-dotTrace.zip

The "Sampling" and "Tracing" dumps contain only the first 10 seconds or so of the database loading process, for the "Tracing" dump I really waited the full time (I think is was 7-8 min. or so) till the database loading has been completed in 64 bit mode.

Please note that the AquaSerializer version used is not your very latest code, but the version after your commits from May 6, 2016: If I remember correctly there were some compatibility issues with some code changes after this date which prevented us from updating. But please correct me if I'm wrong and we should be able to update keeping read-compatibility to existent databases.

Thanks a lot in advance for any ideas on this strange issue! Andreas

AqlaSolutions commented 7 years ago

I checked the samples, it looks like GC takes very long time: 9.6 of 14.8 seconds. There may be some unnecessary allocations. https://www.screencast.com/t/cYXoG8wbe4

There is a feature in dotTrace called "Compare Snapshots" which could be usable here if you profile it again in 32bit mode (sampling, timeline, tracing). Also to track the memory allocation issue I suggest you to capture dotMemory samples too (both for 64bit and 32bit). I doubt that BasicList.Add method called by NetObjectCache.SetKeyedObject behaves differently on 64bit though - https://www.screencast.com/t/rouPIwU4IyYS

Note that you may improve performance by precompilling serialization dlls (RuntimeTypeModel.Compile) instead of using AutoCompile on the fly which takes 5.5 seconds https://www.screencast.com/t/4Wd1Boj9

ab-tools commented 7 years ago

First thanks a lot for your analyzing!

OK, I will then re-record the dumps on the client computer in 32 bit to be able to compare them. Additionally I will get some memory dumps as well.

ab-tools commented 7 years ago

So, now I captured the 3 dotTrace sampling variants both for 64 bit and 32 bit: https://www.fs-flightcontrol.com/download/temp/FS-FlightControl-Dumps.zip

Additionally also 1 memory snapshot while running in 64 bit is included.

I hope this sheds some light on this strange issue. :-)

Thanks again for your support! Andreas

AqlaSolutions commented 7 years ago

Please add also the memory profiling results for 32bit as I asked previously. I need to compare which objects are allocated more in 64bit. Snapshots look somewhat differently (due to optimizations) but I can't blame any exact thing. 64 just needs more time for GC. May be GC has some different settings or may be the app swaps to pagefile. Can you perform profiling with Model.AutoCompile = false? It will work slower but should disable some optimizations and exclude compilation stuff from results so they provide more useful data. Also I want to check whether the execution time is still so different when you disable compilation.

AqlaSolutions commented 7 years ago

Also I have a suspect that ResourceManager called from Control.OnPaint produces much garbage. May be in 32bit it works better. Consider caching it. https://www.screencast.com/t/pCVbQZse793

ab-tools commented 7 years ago

Interesting that the ResourceManager wastes so much resources here: Although I don't think that's the problem as the splash screen using this ResourceManager call frequently is shown also while other things are loaded on program start, but only the database load is delayed so much in 64 bit, I still cached the image which is loaded here now. So at least this "garbage" should be saved now. :-)

OK, I try to collect some more data according what you wrote: Is it enough to collect the data with Model.AutoCompile = false just for one of the dotTrace variants? I mean either "Sampling", "Tracing" or "Timeline" or do you really need all of them again? Just asking as it takes so long to collect just one of them in 64 bit due to the long loading time and I want to keep it as short as possible to keep the customer willing to help finding this issue. ;-)

Thanks again! Andreas

AqlaSolutions commented 7 years ago

You can skip only 1 - "Sampling" - because "Tracing" replaces it but I need "Timeline" anyway to find memory allocations.

ab-tools commented 7 years ago

I think we really found the problem - it's the auto compilation!

As you requested I've tested now with Model.AutoCompile = false and then it start more or less in the same time in 64 Bit as in 32 Bit. I verified that several times switching back and forth with the EXE files (with and without AutoCompile and I could reproduce it everything time).

Still I recorded all the snapshots you requested as it might still be interesting to find out why AutoCompile makes this problem.

I'm currently waiting for the snapshots to be transferred: Depending on the time it needs it might get tomorrow morning till I post the link here.

Still I wanted to let you know already that it obviously has something to do with the AutoCompile option. :-)

AqlaSolutions commented 7 years ago

@ab-tools, for AutoCompile = false I don't think that those snapshots may show anything useful if the issue was not reproduced with it. At least now we know that it's somehow related to the compilation. But I still need the 32bit memory snapshot without AutoCompile = false change. btw how long did it take to perform deserialization with AutoCompile = false?

The fast solution for you would be to pre-generate serialization dlls with Model.Compile(filename) on your developer machine and Reflection-load the generated model (to avoid circular dependency) - this ensures the same compiled serialization code on any architecture - but I still would like to fix this in the serializer project. So could you please run these tests on the customer machine:

  1. Add types with Model.Add and then manually call Model.Compile() without file name (before using the model) - will that model be still slower on 64bit? It should work the same as auto compilation but if it's not it may point the cause.
  2. Same but call Model.Compile with file name and, if 32/64bit models work with different speeds, send me the generated 64bit and 32bit dlls for comparison along with their dotTrace profiling snapshots.

In both cases you need to actually use the model returned by Compile method, not your original one.

ab-tools commented 7 years ago

Thanks for your quick reply again!

btw how long did it take to perform deserialization with AutoCompile = false ?

The customer computer is quite slow (only an old Core2Duo), so it took about 10 sec. or so, but still A LOT quicker then the about 10 min. with AutoCompile enabled. :-)

I just uploaded the new dumps here: https://www.fs-flightcontrol.com/download/temp/FS-FlightControl-Dumps2.zip This also includes the requested 32 Bit memory dump with AutoCompile (= without AutoCompile = false).

I will prepare the two tests you mentioned and ask the customer to run them. As soon as I have a result I will let you know!

ab-tools commented 7 years ago

Just trying to prepare the test versions for the customer.

Maybe stupid question, but I never used this pre-compiled assemblies yet: Do I really have to add ALL possible types (including inherited sub-types) to the model manually? I mean that are quite a lot classes to add in this case...

Till now I only registered the required surrogates on type model, nothing else.

And the parameter applyDefaultBehaviourIfNew needs to be true I guess to have the same result as currently with the AutoCompile option, right?

AqlaSolutions commented 7 years ago

Right, the default behavior. But you don't need to add them manually.

Just export once the state of your model after deserializing your data (which auto-adds types) - model.Serialize(model.ExportTypeRelations()) and embed it with your app as a byte[] resource or a file.

Then instead of adding types modify your code to use model.ImportTypeRelations(model.Deserialize(...), true) - all types should be added the same way as they were registered in the original model. The pseudo code:

var model = TypeModel.Create();
...register surrogates...
model.ImportTypeRelations(...);
var compiled = model.Compile();
var result = compiled.Deserialize(...);

Depending on your situation you may need to reorder the surrogates registration after ImportTypeRelations.

More https://github.com/AqlaSolutions/AqlaSerializer/wiki/Batch-types-registration

AqlaSolutions commented 7 years ago

Can you also check it with CompileInPlace? The same way using ImportTypeRelations.

ab-tools commented 7 years ago

OK, I see that's still a bit more to test again, so it's probably better I'll do that via TeamViewer on the customer computer again myself: Unfortunately the customer always has only time at the evening and I don't have time today and tomorrow evening. I therefore try to get a date with the customer somewhen this weekend.

Just wanted to let you know that you don't think I forgot it when I don't reply quickly again: I expect that I have a feedback somewhen on the weekend at the latest!

ab-tools commented 7 years ago

Just wanted to prepare the tests for the customer already, but when trying to so something like

using (var memoryStream = new MemoryStream())
{
    typeModel.Serialize(memoryStream, typeModel.ExportTypeRelations());
}

it seems that the TypeModel object does not have a function ExportTypeRelations().

How can I access this?

Best regards Andreas

AqlaSolutions commented 7 years ago

@ab-tools RuntimeTypeModel has

ab-tools commented 7 years ago

Ah, I see, thanks!

ab-tools commented 7 years ago

Unfortunately I have still a lot problems getting the model.Compile() to work: First all our database classes are internal which the compile method obviously does not like. So I changed all about 100 model classes and enum types to public.

Then the compile function worked fine, but it could no longer read the existent database.

So I thought I'm completely rebuilding the database with this "new" type model, but also the rebuilt database cannot be read afterwards again resulting in this exception upon deserialization:

AqlaSerializer.ProtoException: Group not read entirely or other group end problem
   at AqlaSerializer.ProtoReader.EndSubItem(SubItemToken token, Boolean skipToEnd, ProtoReader reader) in C:\Austausch\#FS\libs\AqlaSerializer-v2\AqlaSerializer\protobuf-net\ProtoReader.cs:line 790
   at AqlaSerializer.RootHelpers.ReadOwnFooter(Boolean seeking, Int32 formatVersion, ProtoReader source) in C:\Austausch\#FS\libs\AqlaSerializer-v2\AqlaSerializer\protobuf-net\RootHelpers.cs:line 104
   at b49e09a7-92e8-4267-9d84-008d9252533e.ReadRoot(Database , ProtoReader )
   at b49e09a7-92e8-4267-9d84-008d9252533e.Deserialize(Int32 , Object , ProtoReader , Boolean )
   at AqlaSerializer.Meta.TypeModel.DeserializeCore(ProtoReader reader, Type type, Object value, Boolean noAutoCreate, Boolean isRoot) in C:\Austausch\#FS\libs\AqlaSerializer-v2\AqlaSerializer\protobuf-net\Meta\TypeModel.cs:line 805
   at AqlaSerializer.Meta.TypeModel.Deserialize(Stream source, Object value, Type type, Int32 length, SerializationContext context) in C:\Austausch\#FS\libs\AqlaSerializer-v2\AqlaSerializer\protobuf-net\Meta\TypeModel.cs:line 764
   at AqlaSerializer.Meta.TypeModel.Deserialize[T](Stream source) in C:\Austausch\#FS\libs\AqlaSerializer-v2\AqlaSerializer\protobuf-net\Meta\TypeModel.cs:line 660
   at FlightControl.Helper.DatabaseHelper.loadDatabase(String databaseFileName, String& errorText, Boolean& navigationDataReloadRequired) in C:\Austausch\#FS\FS-Control\Helper\DatabaseHelper.cs:line 57

All types were added before as you suggested by:

using (var memoryStream = new MemoryStream(typeModelData))
{
    var modelData = typeModel.Deserialize<ModelTypeRelationsData>(memoryStream);
    foreach (var typeData in modelData.Types)
    {
        typeModel.ImportTypeRelations(typeData);
    }
}

This also seemed to work fine - at least there was no exception, but it just fails upon deserialization when I called model.Compile() before. On the other hand deserialization works just fine when I do not call this method and keep AutoCompile enabled.

Do you have any idea what I need to do to get the database deserialized again after model.Compile() is called? I'm running out of ideas here...

Tomorrow afternoon a would have another chance to test something on the customer computer.

Thanks Andreas

AqlaSolutions commented 7 years ago

It's hard to say anything from the code you provided.

  1. You don't need foreach for ImportTypeRelations, it accepts ModelTypeRelationsData directly as an argument and also you should pass true as a second parameter.
  2. It looks like the data you are loading is somehow broken (multiple serializations to the same stream without *LengthPrefix?)
  3. You might use a wrong model instance e.g. calling Compile() and then using the same instance for serialization (not the one returned by the Compile() call).
  4. Do you setup surrogates? I don't see it in your snippet.
  5. For check try to deserialize immediately after Serialize call using the same model instance (of course you need to rewind the stream).
  6. Another approach instead of using Import/ExportTypeRelations is using Add(Assembly) or Add(Type[]) method but this has a high chance to require rebuilding your database (because of adding subtypes in different order). These methods add all supported types e.g. marked with attributes.
ab-tools commented 7 years ago

Thanks for your quick reaction again!

Your point 1 was the problem: After I used ModelTypeRelationsData directly with the second argument being true it worked! Sorry, not sure why I missed this function overload before.

OK, as I already mentioned, I should have tomorrow afternoon the next chance to do tests on the customer computer: Then I will do all the test you wrote above.

ab-tools commented 7 years ago

So, now I did all the tests on the customer PC that you requested with following results:

// default, no type relations import
return typeModel;
=> very slow in 64 Bit (291 sec.), fast in 32 Bit (10 sec.).

// import type relations first
typeModel.CompileInPlace();
return typeModel;
=> very slow in 64 Bit (370 sec.), fast in 32 Bit (9 sec.).

// import type relations first
return typeModel.Compile();
=> very slow in 64 Bit (359 sec.), fast in 32 Bit (8 sec.).

// import type relations first
return typeModel.Compile("FlightControlModel", "FlightControlModel.dll");
=> very slow in 64 Bit (297 sec.), fast in 32 Bit (8 sec.).

// default, no type relations import
typeModel.AutoCompile = false;
return typeModel;
=> much faster in 64 Bit than the others (39 sec.), but 32 Bit seems still a bit faster (16 sec.).

As I had time differences +/- 60 sec. in the slow 64 Bit tests anyway the slightly faster database load of typeModel.Compile when used with parameters to write the assemble should not be considered. All 64 Bit tests were basically "the same extremely slow" except the one using typeModel.AutoCompile = false;.

Please also find attached the two type model DLLs as requested.

FlightControlModel.zip

ab-tools commented 7 years ago

I still have access to the customer computer right now (for the next hour or so): So if you would have another idea what to test, I could do so directly now...

AqlaSolutions commented 7 years ago

Thanks for testing it. I checked the dlls you sent me and they are 100% identical except GUID field. The same code just runs with different speed.

  1. May be the JIT compiler works somehow differently with these dlls on 64bit. I'm sorry for asking you to test again but I need to compare "cold-start" and "hot" deserialization speeds for compiled dlls version. And during checking it please use already generated pre-compiled dlls so compilation time is not included into deserialization timings. You can load a pre-compiled model like (TypeModel)Activator.CreateInstance( Assembly.Load("FlightControlModel.dll").GetType("FlightControlModel")).
  2. This test should also show whether the slowdown is caused by compilation (as I understand you have been including Compile() calls into timings). If only Compile() call is slow you can just distribute pre-compiled model dlls within your app.
  3. If it's the case please record profiler data with Compile() call only.
  4. It's also curious whether the write speed is affected by this issue.
AqlaSolutions commented 7 years ago

@ab-tools if the reason is JIT compiler this could be solved by updating .NET framework + latest patches. There is an issue on how x64 JIT handles long methods https://connect.microsoft.com/VisualStudio/feedback/details/508748/memory-consumption-alot-higher-on-x64-for-xslcompiledtransform-transform-then-on-x86 but since the issue is not reproducable on your machine it could be already fixed in .NET

First, the x64 JIT compiler has a few algorithms that are quadratically scaling. One of them is the debug info generator, unfortunately. So for very large methods, it really gets out of control.

ab-tools commented 7 years ago

OK, I will try to get access to the customer PC for another test round. ;-)

as I understand you have been including Compile() calls into timings

In fact not: I just checked and the time I have written in the protocol above is really only opening the database FileStream and calling the Deserialize method - nothing else. The compilation of the type model has been done before.

@ab-tools if the reason is JIT compiler this could be solved by updating .NET framework + latest patches. There is an issue on how x64 JIT handles long methods https://connect.microsoft.com/VisualStudio/feedback/details/508748/memory-consumption-alot-higher-on-x64-for-xslcompiledtransform-transform-then-on-x86 but since the issue is not reproducable on your machine it could be already fixed in .NET

What's the best (and reliable) way to check the .NET 4.0 version number? Then I could check on the customer PC which version he has and compare it to the version I have here.

AqlaSolutions commented 7 years ago

@ab-tools may be this https://docs.microsoft.com/en-us/dotnet/framework/migration-guide/how-to-determine-which-net-framework-updates-are-installed , compare the outputs on your and your customer machines. Or just ensure that windows update works properly.

ab-tools commented 7 years ago

OK, I will check that.

I just to be sure I got the "cold" and "hot" start right: You mean the difference if it loads in 64 bit the first time and the following times, right?

Here I think I can directly say that this does not have a considerable influence: As I mentioned in the test protocol above already, I had different loading times with exactly the same assembly of +/- 60 sec. directly one after the other anyway. Also the very first time I tried it yesterday did not to have a bigger difference than this time in one or the other direction.

aienabled commented 7 years ago

Hi guys, If the issue is not reproducible on another computer, it's definitely related to software configuration or hardware. I would recommend trying to use this wonderful tool https://www.microsoft.com/en-us/download/details.aspx?id=30135

AqlaSolutions commented 7 years ago

During the previous test It was not exactly the same assembly, those assemblies were produced by different compilation calls so JIT-ed each time.

Also the very first time I tried it yesterday did not to have a bigger difference than this time in one or the other direction.

Do you mean that writes are same slow?

Thanks, @aienabled

ab-tools commented 7 years ago

If the issue is not reproducible on another computer, it's definitely related to software configuration or hardware. I would recommend trying to use this wonderful tool https://www.microsoft.com/en-us/download/details.aspx?id=30135

Thanks, @aienabled, for helping. :-)

It is till now reproducible on about 3-4 customer computers I know of. But good point, I will try the .NET repair tool you mentioned on the customer PC.

Do you mean that writes are same slow?

@AqlaSolutions, Why writing? On the customer PC I only tried loading the database (which is the much more important case as done on every program start), no writing it.

aienabled commented 7 years ago

@ab-tools, if the issue is reproducible on other computers than it's more likely to be a software issue than hardware. Other than updating and repairing .NET Framework I would also recommend looking into the system monitor to check if there are any background applications which could affect the performance. It could be an antivirus software.

AqlaSolutions commented 7 years ago

@ab-tools in the

Also the very first time I tried it yesterday did not to have a bigger difference than this time in one or the other direction.

what do you mean by:

did not to have a bigger difference than this time in one or the other direction

ab-tools commented 7 years ago

@aienabled: Good point, but there was no anti-virus or firewall program running - that's always the first thing I check. :-) But I will create a SystemInfo dump the next time I'm on this customer PC to check the running processes in more detail.

@AqlaSolutions: Sorry, this sentence was really a bit confusing. I just wanted to say that between the very first time I ran a test on the client PC yesterday and running the same test again a second time directly afterwards there was no bigger difference than +/- 60 sec. And I saw differences around that also later again.

But anyway, I can do such a test again just to be sure.

AqlaSolutions commented 7 years ago

@ab-tools ok, ensure that you are not making new models with Compile() on each read, for best just use already pre-generated dlls.

ab-tools commented 7 years ago

I had the chance to test on the customer computer again and I can't believe it, but we really found the problem: In fact, .NET itself was the problem!

On the customer computer all Windows Updates were displayed (with the last update end of 2014) which is normal as it's a flight simulation computer - everybody that has a system that "just needs to run as it is" will at first disable all Windows Updates. :-)

But due to another application the customer uses he had to install .NET 4.6.0 on this computer manually.

Of course, he did not want to enable Windows Update, but I suggested to install the latest bug fix update .NET 4.6.2 manually again - and this did the trick!

Although our product is just based on .NET 4.0 after upgrading from .NET 4.6.0 to 4.6.2 the massive database loading delay on 64 Bit was immediately gone.

I really didn't think that .NET could be the problem here, but obviously the version 4.6.0 was a bit buggy. :-)

That was a hard way to go - thanks a lot for your support in finding this issue which at the end was not related to our both products at all!

aienabled commented 7 years ago

@ab-tools, we learned the same lesson at least two times... Since then the first thing we do is running .NET Framework Repair tool and installing latest patch version (i.e. 4.5->4.5.2, 4.6->4.6.2). With .NET Core 2.0 this problem could be finally resolved - just ship it together with the product. .NET Core 2.0 has almost all API from .NET Framework so the transition should be smooth. But WPF applications could not run under .NET Core, alas.

AqlaSolutions commented 7 years ago

@ab-tools cool, finally we resolved this

ab-tools commented 7 years ago

@aienabled: We didn't look into .NET Core yet to be honest. Our product is still WinForms based. Is that supported by .NET Core 2.0?

@AqlaSolutions: Two short other questions:

  1. Although we did not need it anymore, I tried to use (TypeModel)Activator.CreateInstance(Assembly.Load("FlightControlModel.dll").GetType("FlightControlModel")) to load the pre-compiled type model, but whatever I do it always tells me that there is a dependency (but not which one) missing. What additionally dependency is needed here?
  2. As mentioned we still use a quite old AqlaSerializer version (after your commits from May 6, 2016). I think there was a breaking format change after that. Is that correct or can we upgrade without problems? Also if we can't upgrade is there any change after the commit I mentioned which we should take over manually as it improves performance or anything else?

Thanks again! Andreas

aienabled commented 7 years ago

@ab-tools ,

Our product is still WinForms based. Is that supported by .NET Core 2.0?

Unfortunately, no, and it seems it will never be supported. AFAIK, only UWP (which is very similar to WPF) and Xamarin Forms have support now.

AqlaSolutions commented 7 years ago

@ab-tools,

  1. it references your original dll with types used in serialization and aqlaserializer.dll with its dependencies. Make sure you put the model dll beside your app dlls.
  2. see the comments starting from https://github.com/AqlaSolutions/AqlaSerializer/issues/13#issuecomment-216838290
ab-tools commented 7 years ago

Thank you both!

I now also got the DLL loading to work - you just need to use Assembly.Load(@"FlightControlModel"), so without ".dll"!

But as the runtime compilation works fast, I probably won't use that, makes everything only more complicated adding another source of errors. ;-)

Thanks again for your great support! Andreas