bongohrtech / lucenenet

Mirror of Apache Lucene.Net
Apache License 2.0
0 stars 0 forks source link

Sequential IndexWriter performance in concurrent environments. #110

Open bongohrtech opened 4 years ago

bongohrtech commented 4 years ago

When creating Lucene.Net indices in parallel, sequential-like performance is experienced. Profiling 8 concurrent IndexWriter instances writing in parallel shows that WeakIdentityMap::IdentityWeakReference::Equals spends most time garbage collecting (94.91%) and TokenStream::AssertFinal (87.09% garbage collecting) in my preliminary tests (see screenshots).

The WeakIdentityMap implementation uses an IdentityWeakReference as key, which is implemented as a class. By inspection of this class, it is merely a System.Runtime.InteropServices.GCHandle wrapper as can be seen in the mono project, manually wrapping of this struct in a struct rather than a class - will eliminate some of the immense amounts of garbage collection.

JIRA link - [LUCENENET-640] created by matt2843

bongohrtech commented 4 years ago

Hi Mathais,

Thanks for the report and the PR.

Correct me if I am wrong, but wouldn't a better fix for this to be to replace WeakIdentityMap with the thread-safe ConditionalWeakTable? We may still need to utilize IdentityWeakReference, but it would need to be a class since ConditionalWeakTable has a class constraint on TKey.

Do note that Microsoft didn't expose the enumerator or the AddOrUpdate method of ConditionalWeakTable until .NET Standard 2.1. However, Lucene requires one or the other in every (other) place where ConditionalWeakTable would be useful (specifically, as a replacement for Lucene.Net.Support.WeakDictionary). An effort to port ConditionalWeakTable from .NET Standard 2.1 back to .NET Standard 2.0 (GH-636) is currently underway, but stalled on this J2N branch. Unfortunately, it depends on unmanaged resources that somehow need to be re-mapped or embedded in order to make it functional. Perhaps there is also a way to cut through at a lower level and make a ConditionalWeakIdentityTable that could be used as a direct replacement for WeakIdentityMap instead of using IdentityWeakReference.

If you could take a look at using ConditionalWeakTable to solve this issue, it would be much appreciated. Since there appears to be one place where the enumerator is required here, the best approach would be to first check for compatibility on .NET Standard 2.1 and if that works, help us to complete the port of ConditionalWeakTable for .NET Framework 4.5 and .NET Standard 2.0 by submitting a PR to the J2N project so the same fix can also be applied to those platforms.

by nightowl888

bongohrtech commented 4 years ago

Hi Shad Storhaug,

I think Ayende Rahien faced same issue, 

take a look here

by eladmarg

bongohrtech commented 4 years ago

Elad,

Well, this particular issue has been addressed by swapping ConditionalWeakTable for WeakIdentityMap. However, we are still using WeakDictionary in several places, most notably in the FieldCacheImpl (see GH-610). This is due to the lack of enumerator support before .NET Standard 2.1 primarily.

That said, the article that describes the problem specifically mentions it is a problem with Lucene.NET. Thanks for the link. Hopefully, he is planning to submit the fix back to us after he thoroughly tests it.

by nightowl888

bongohrtech commented 4 years ago

We are using an old version of Lucene at the moment, so I don't think that a PR from my current codebase would be very helpful.

The PRs for the changes in Lucene 3.0.3 are here:

https://github.com/ravendb/lucenenet/pull/10

https://github.com/ravendb/lucenenet/pull/11

 

Feel free to make use of them as you see fit.

This is currently under testing, we saw GC time reduced from GEN 0 ~0.9 seconds to 0.125 seconds. GEN 2 from 0.9 to 0.5.

by ayende