CoreyKaylor / Lightning.NET

.NET library for LMDB key-value store
Other
397 stars 82 forks source link

Consider Supporting Incremental mapping on Windows. #131

Closed AlgorithmsAreCool closed 3 years ago

AlgorithmsAreCool commented 3 years ago

@CoreyKaylor

Today on windows, if you declare a MapSize of 1GB, the DB immediately allocates that space. This means you have to guess the max size of your DB up front.

If you guess too low, you hit a wall at some point and need to reallocate the entire DB, which is hella expensive in both time and space.

If you guess too high, you waste a ton of space on disk.

To support this, we would need to compile LMDB manually for windows and unfortunately this would impact performance by ~30% (according to the linked thread).

Ultimately, this might not be a good idea, but I figured I would bring it up.

See this thread for some background : https://www.openldap.org/lists/openldap-bugs/201902/msg00009.html

CoreyKaylor commented 3 years ago

It used to already be doing that. My guess is that the build process for the Makefile changed up a bit.

Here are the instructions I previously wrote up, but guessing another flag or something is missing.

https://github.com/CoreyKaylor/Lightning.NET/wiki/Compiling-the-lmdb-binaries-on-Windows

AlgorithmsAreCool commented 3 years ago

I'll take a look and do some investigation into performance impact. Might take a few days.

leesei commented 3 years ago

Related (solved) upstream issue https://bugs.openldap.org/show_bug.cgi?id=8324

AlgorithmsAreCool commented 3 years ago

Oh dear, sorry it had been a crazy past few months. I'll try to resolve this in the next few days

leesei commented 3 years ago

Nevermind, it has been crazy the whole year :rofl:. Just sharing our finding as we've encountered this issue recently.

klightspeed commented 3 years ago

It looks like you need to keep an eye on env.Info.LastPageNumber, and when it gets too close to the env.MapSize / 4096 (or when tx.Put returns MDBResultCode.MapFull) you need to commit any outstanding transactions and then increase the map size (you can't adjust the map size while a write transaction is open).

Edit: this is complicated by the fact that env.Info.LastPageNumber isn't updated until a transaction is committed.

AlgorithmsAreCool commented 3 years ago

@klightspeed Well, sorta.

So this issue isn't about increasing map size dynamically. This issue is about letting LMDB do that automatically. LMDB introduces support for "thin provisioning" mapped files on windows several years ago, this allows a DB to start out at a few kb instead of the full map size (which will typically be set to many MB or more).

If you build LMDB with a certain set of flags it will support this on windows. The C# wrapper should not have to manage this complexity directly.

That being said, if the performance loss is too much, then this library could potentially offer a feature to allow map size increases.

chris-skuvault commented 3 years ago

@klightspeed Something to note as well as when trying to resize the map size you also cannot have any read transactions reading from lmdb at the same time that the map is being resized or you could see the following exception: "System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt." So any reads would need to be completed and closed before adjusting the map size as well as writes to ensure that this issue doesn't occur.

@AlgorithmsAreCool We are on an old version of lmdb that uses the spare files on windows. We are looking to upgrade to latest version but are having trouble figuring out how to get a build with the right flag do you know which flags are used to build it using the sparse file for windows? Or where documentation might be that has this information?

slav commented 3 years ago

Also, from looking at the history of LMDB, it never supported this in the release branch. It did (and still does by the look of it) support it in the master branch. @CoreyKaylor - did you previously build from the master branch, not release branch?

CoreyKaylor commented 3 years ago

@slav unfortunately I don't remember. I generally only compiled these binaries once or twice a year and until recently the process was very cumbersome given the setup I had.

@AlgorithmsAreCool I don't think this library should support resizing the map directly. I would rather keep it slim and close to the core lib. That said, I think a possible solution would be to include a separate binary (not the default location) and the msbuild file supporting a property to copy the correct lib over if you want the sparse file for windows. So default is fast and using the recommended path from the library authors, but the "I don't want to have to think about my storage needs up front" is still available to those that may want that. Make sense?

slav commented 3 years ago

Just to update on this:

p.s. You can also change map size dynamically, but you have to guarantee there're no open transactions as virtual memory will be remapped and if you have any transactions going during remapping you'll probably get access violation exception (since address space changed, but your transaction doesn't know about it).

valeriob commented 3 years ago

Hi, is there any update on this feature ? Is it possible that since I upgraded from 0.10 to 0.13 my file sizes exploded ? :D (i'm using many independent environments) Thanks

slav commented 3 years ago

This isn't feature of Lightning.NET, but of LMDB. You can just build LMDB from master and drop resulting .dll over the one provided with Lightning.NET and you will get this feature. This is what we did.

valeriob commented 3 years ago

Thanks @slav but if it's not automated with a package looks like a maintenance nightmare, how do you even override a native dll provided by a nuget package, msbuild copy build step ? I'm trying the workaround, but without much luck, the EnvironmentInfo.LastPageNumber throws System.OverflowException :( image

valeriob commented 3 years ago

I guess it was easier to just use version 0.11 for now. What help would be needed to fix this issue ? Just build this for windows https://github.com/LMDB/lmdb/tree/mdb.master/libraries/liblmdb ? If you point me to the right branch I can try to make a pull request.

Thanks

slav commented 3 years ago

Yes, we use msbuild to replace custom dll built from master branch you referenced.

    <ItemGroup>
        <None Include="..\runtimes\win-x64\native\lmdb.dll">
            <Link>runtimes\win-x64\native\lmdb.dll</Link>
            <CopyToOutputDirectory>Always</CopyToOutputDirectory>
        </None>

        <None Include="..\runtimes\win-x64\native\lmdb.dll">
            <Link>lmdb.dll</Link>
            <CopyToOutputDirectory>Always</CopyToOutputDirectory>
        </None>

        <None Include="..\runtimes\linux-x64\native\lmdb.so">
            <Link>runtimes\linux-x64\native\lmdb.so</Link>
            <CopyToOutputDirectory>Always</CopyToOutputDirectory>
        </None>

        <None Include="..\runtimes\linux-x64\native\lmdb.so">
            <Link>lmdb.so</Link>
            <CopyToOutputDirectory>Always</CopyToOutputDirectory>
        </None>
    </ItemGroup>

To get consumed bytes we use (info.me_last_pgno.ToInt64() + 1) * PageSize. Not sure why you're getting overflow exception.

CoreyKaylor commented 3 years ago

If we get a PR that does the following, I would be happy to accept.

  1. Default stays with what's there.
  2. Additional binary is included in nuget package
  3. If MSBuild property "RequiresIncrementallySize" -> true, then copy the the alternate binary. LigntningDB.targets
  4. Document in wiki or in the PR how the binary was produced, preferably relying on WSL2 and not the Windows VS-centric toolchain.
  5. Test to prevent regressions in the future. May require a separate test step in the github actions.
klightspeed commented 3 years ago

Thanks @slav but if it's not automated with a package looks like a maintenance nightmare, how do you even override a native dll provided by a nuget package, msbuild copy build step ? I'm trying the workaround, but without much luck, the EnvironmentInfo.LastPageNumber throws System.OverflowException :(

To get consumed bytes we use (info.me_last_pgno.ToInt64() + 1) * PageSize. Not sure why you're getting overflow exception.

It looks like 2eb29f2 fixes your overflow - I'm guessing this will be in v0.14

valeriob commented 3 years ago

Thank you all, i'm looking into it, I have a question : I've seen the msbuild step to copy the native dll into the output folder, and I guess it to allow project references to use the correct native dependencies ? I did not find any docs on how to allow runtimes dependencies to flow project references :(

I do not think that just copy dll around will solve the issue, because the runtime will look for dlls in the runtimes folder : image

So the other two way to make it work maybe are : 1) adding another binary, and via C# config parameter use a level of indirection to the native methods in the class Lmdb.cs 2) crate a different nuget package with the different lmdb buid

What do you think ?

slav commented 3 years ago

I believe for native libraries it's hard-coded. You have to use runtimes//native, etc. You'd have to copy, during build, correct dll into that folder.

valeriob commented 3 years ago

I believe for native libraries it's hard-coded. You have to use runtimes//native, etc. You'd have to copy, during build, correct dll into that folder.

Thanks @slav unfortunately I do not think that's wise, when consuming the nuget package, everything is managed by nuget and you do not have a build step, that's why i'm suggesting to ad another file, and reference that instead.

slav commented 3 years ago

Is it possible to include lmdb with different name and reference different by different name from code? For example you can include version in the file name.

valeriob commented 3 years ago

Exactly what I'm proposing, since DllImport it's an attribute it requires a constant, we need a switch to use the correct method/dll.

CoreyKaylor commented 3 years ago

With the native libs, they are essentially a runtime dependency. Copying the correct lib over AfterBuild should continue to work as expected. I would prefer to prove that this route wouldn't work before employing other options. There are performance implications to making the native lib conditional. It used to be that way actually, so I would prefer to not go back to that.

Regardless of where the runtime by default copies the files from, you can overwrite the file. So it could be in runtimes/native/windows-incremental and copied from there.

valeriob commented 3 years ago

Thanks @CoreyKaylor, so you suggest to try an AfterBuild step, ok. What I do not like is that something that should be considered immutable is switched by some hidden code at last moment, I have bad scars 😄 As an alternative given that all the native methods ad very well factored out in this class LightningDB.Native.Lmdb it would be quite easy to create an instance of that class (having two specific versions) into LightningEnvironment where the config parameter would be, and then make it flow to transaction, cursor, db etc.

CoreyKaylor commented 3 years ago

Let me see what you come up with then.

valeriob commented 3 years ago

Thank you @CoreyKaylor here is what I had in mind, what do you think, too invasive? https://github.com/CoreyKaylor/Lightning.NET/pull/136

slav commented 3 years ago

Yes, that what I was thinking of doing as well. Use different instance depending on situation (although I wouldn't call it lmdb2, maybe lmdb.master or something like that)

CoreyKaylor commented 3 years ago

Let me mull it over. It's not too intrusive. It irks me a little to have something that applies to one platform in runtime code. The swap of a native dll requires no config. I can appreciate the comment above regarding "hidden code", but it would shift this to an explicit compile-time / deployment-time concern.

valeriob commented 3 years ago

Thanks, pls think about it :D I deliberately named it "2" to defer the naming to ppl that know better 😄

CoreyKaylor commented 3 years ago

I think I'm going to opt for a conditional compilation for this feature and leverage the techniques described here to accomplish this.

https://docs.microsoft.com/en-us/dotnet/standard/native-interop/cross-platform

This means the incremental mapping would only be supported on .netcore 3.1 or higher version. I'll preserve the netstandard and it won't change from the existing behavior due to the conditional compilation.

valeriob commented 3 years ago

Cool, i've never found Custom import resolver !

CoreyKaylor commented 3 years ago

Me either, got lucky today and stumbled upon it.

CoreyKaylor commented 3 years ago

@valeriob I stole your assemblies and copied the test you wrote. Could you share how you compiled the assemblies and which version you compiled against either here, or in the wiki? The list of variations is growing and coming up with something scripted might reduce some headache down the road I think.

https://github.com/CoreyKaylor/Lightning.NET/wiki/Compiling-the-lmdb-binaries-on-Windows

valeriob commented 3 years ago

Hi, I followed that guide for WSL, and I used this commit 48a7fed59a8aae623deff415dda27097198ca0c1

Valerio

chester89 commented 1 year ago

I tried to find the answer in the mailing list but wasn't able to. This is a Windows-specific issue, right?

AlgorithmsAreCool commented 1 year ago

@chester89 This issue is already implemented and closed. But yes this is a windows specific request