Closed AceHack closed 4 years ago
Basically what I'm looking for is the best Write Ahead Log (WAL) possible on a cross platform language.
Your request seems to be reasonable building component for database-like systems. However, it does not seem to fit the scope of CoreFX/BCL and would be IMO better served as independent library.
In general we consider extending CoreFX with APIs which:
See dotnet/runtime#22228 for examples of APIs we consider out of scope of CoreFX.
There are a few reasons I think this belongs to CoreFX/BCL instead of a library. 1) It was part of BCL of full .NET framework on windows (This is not a very good reason) 2) It requires low level below FileStream access to things like overlapped unbuffered IO, I/O completion ports or maybe just the native win32 apis for CLFS (Common Log File System) and on Linux would require below filestream access as well for best performace and true durable compliance with things like write though and no caching so you can be sure things are commited and durable and not in buffers either hardward or software. Also flushing occasionally is not an option because it's so SLOW to operate in that mannor. 3) To really have the base needed for the performance, quality, and durability guarantees having a high quality team like MS be the curators would be wonderful. 4) If SQL server would be willing to share some techniques (maybe not) that could provide some latest techniques for high quality transaction log.
Thanks.
Here is some example low level code that would likely be used on windows. This is not a full WAL or ARIES yet but I image it would use a lot of the same constructs. You can see how many low level OS constructs are required and one reason why I think it would be best to be in CoreFX.
This code is to allow for fast sequential writes to a file while making sure every write is "commited" and durable all the way though all buffers including hardware without the need for constant flushing. It also does a lot of work to create page alligned memory buffers i.e. the virtual alloc buffer and figure the sector size of the disk. All of these things are needed for optimal write performance.
Without a better abstraction layer for library writers like mono provided writing new cross platform libraries that need to get into the guts of the different OSs involved is much harder with CoreFX. See https://github.com/dotnet/coreclr/issues/930 for examples of the kind of abstractions I'm talking about.
Also if somehow FileStream is much better in .NET core than full .NET and will allow fast sequential writes while making sure there are no buffers in between all the way to the hardware, page alligned writes, figuring out sector size, then I would love to understand better. Thanks.
Here is some more info on why page and sector alignment are so important. http://programmingaddicted.blogspot.com/2011/05/unbuffered-overlapped-io-in-net.html https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396 https://arxiv.org/ftp/cs/papers/0502/0502012.pdf http://www.installsetupconfig.com/win32programming/windowsfileapis4_13.html
[EDIT] Moved code into details, added C# syntax highlighting by @karelz
Here's my view on your reasons to add the library into CoreFX:
I would recommend to split this issue into 2 separate items:
I'm voting for Write Ahead Log as a part of .NET because I have implementation of Raft Consensus Algorithm for ASP.NET Core which requires to have persistent replication log. At this moment the log is represented by interface and its actual implementation is delegated to the consumer.
@davidfowl what are thoughts from ASPNET perspective
I would like to share implementation of Write Ahead Log that is suitable for general purpose use as a proof of concept. Sources are here and licensed under MIT. What was done:
What was not done (and probably won't):
Originally, it was developed for log replication as described by Raft consensus algorithm so it has some specific API such as Raft node state. However, you can just ignore this part of API. I can say that development of fully general-purpose WAL is not so easy because underlying database engine may require some specific features. Also, I don't have benchmarks because various C# implementations of oss databases have very specific log engines so their performance are not comparable.
@AceHack @sakno Please checkout FasterLog from here: https://github.com/microsoft/FASTER
FASTER is a really fast cross-platform embedded key-value store, and its' internal log implementation was upgraded into a public API in the last few weeks. Super impressive performance characteristics imho - try out the FasterLogSample csproj on a machine with an NVMe SSD and hopefully you'll be impressed with what you see!
cc @badrishc
(disclaimer - I've been using it for a work project and actively collaborating with the main author)
@hiteshmadan This seems pretty amazing, thanks for pointing this out.
@hiteshmadan you might consider a PR to add it to this.
https://github.com/microsoft/dotnet/blob/master/dotnet-developer-projects.md
Submitted a PR as suggested, here. To expand a bit, FasterLog (docs) is a latch-free concurrent persistent log library for .NET. It supports concurrent appends, group commit, multiple persistent iterators (each of which may itself be concurrently used), random reads, and log truncation from head. It can run over sharded and tiered storage backend devices (we have out-of-the-box device implementations for local storage and Azure Page Blobs). Our benchmarks are able to easily saturate 2+ GB/sec on a single local NVMe SSD, and 4+ GB/sec when run over two (sharded) SSDs.
Closing as being addressed by an existing library in the ecosystem.
I'm looking for an very high performance version Algorithms for Recovery and Isolation Exploiting Semantics (ARIES) like .net standard/core namspace. It does not need to be exactly ARIES but would need to be sufficent to be the start of a number of different data oriented projects that choose .NET. I would use this and I think others would use this to start a new eco system of databases, no sql, messaging, streaming and data oriented .NET core open source projects.