Anthony-Nolan / Atlas

A free & open-source Donor Search Algorithm Service
GNU General Public License v3.0
9 stars 5 forks source link

Future HLA Import Parallelisation work #735

Open benbelow opened 2 years ago

benbelow commented 2 years ago

Context:

This mainly comes about because .NET Core 3 does not support Distributed Transactions: https://github.com/dotnet/runtime/issues/715

The DataRefresh code and the ongoing DonorUpdate code share the same underlying code for processing and updating HLAs in the Matching Database.

We would like that code to be both fast and fully transactional, that is to say, updates for any given donor should either be fully written or should fully fail. If they fail then nothing is left behind in the Database from that write.

Unfortunately .NET Core 3 doesn’t support the tools necessary for to get BOTH of these.

TransactionScope is specifically designed to provide transactionality over large blocks of code, and it works very well. But (as of .NET Core 3) it doesn’t support Distributed Transactions, and it needs DTs in order to support multiple parallel connections. It handles multiple sequential SQL connections just fine, but not in parallel.

Unfortunately, you need multiple parallel connections to be able to write the per-locus HLA data to the DB efficiently. Changing from a Task.WhenAll(DoPerLocusWrite) to foreach() { await DoPerLocusWrite()) dropped the performance by 25-35% . (Note: Parallel Inserts on a single connection was attempted, but MARS doesn’t actually allow for parallel execution - it just interleaves them, so you don’t gain anything.)

So as it stands we can EITHER have it fast OR have it transactional. The code allows either option based on a boolean.

We’ve currently opted for:

the HLA Processing in DataRefresh opts to be “Fast”.

It needs the performance boost, and covers for the transactionality with the batches and overlaps-on-continue functionality.

the ongoing DonorUpdates opts to be “Transactional”.

they aren’t yet known to need the performance, and have bigger problems with transactionality due to less control over message replay and less orderliness.

These choices are controlled by appSettings, independently.

Task

If .NET Core starts to support Distributed Transactions (theoretically in .NET Core 5? See the thread linked above) then we should trial allowing the writes to be parallel AND transactional (just change the await code controlled by the boolean.) You MUST perf test it in detail! See the HighVolume DonorUpdate tests. Hopefully this will be a very quick ~30% win, once .NET Core catches up.

Alternatively, if we conclude that the performance of the DonorUpdates is inadequate at the HLA writes are the only remaining bottleneck (AND running them in parallel would be enough to solve the performance!!) then we would need to look into alternative ways to manage the transactionality of the DonorUpdates. That’s going to be a LOT of extra work, and it will be worth doing a lot of serious perf analysis of other bottlenecks, before you get to that!

benbelow commented 2 years ago

The original ticket description above was taken from AN JIRA, raised in Jul 2020.

The linked issue did not make it into .Net 5, nor into .Net 6.

It is currently scheduled for .Net 7 - at which point we should be able to capitalise on the performance gains described above.