azist / azos

A to Z Sky Operating System / Microservice Chassis Framework
MIT License
213 stars 29 forks source link

Reading data from MMFPile #866

Closed pantonis closed 1 year ago

pantonis commented 1 year ago

I have the following code

public class Cache
{
    private readonly LocalCache cache;
    private readonly MMFPile mmfPile;
    private readonly string TABLE_NAME = "testData";

    public Cache()
    {
        cache = new LocalCache(NOPApplication.Instance);

        mmfPile = new MMFPile(NOPApplication.Instance);
        mmfPile.DataDirectoryRoot = AppDomain.CurrentDomain.BaseDirectory;
        mmfPile.Start();

        cache.Pile = mmfPile;
        cache.DefaultTableOptions = new TableOptions("*")
        {
            CollisionMode = CollisionMode.Durable,
            DefaultMaxAgeSec = int.MaxValue,
            MaximumCapacity = 0, 
        };
        cache.Start();

                LoadAll();
    }

    public void Write()
    {
        var data = new TestData
        {
            Id = 1,
            Name = "test",
        };

        var table = cache.GetOrCreateTable<int>(TABLE_NAME);

        var putResult = table.Put(1, data);

        Console.Write($"{putResult}");
    }

    public void Read()
    {
        var table = cache.GetOrCreateTable<int>(TABLE_NAME);

        var data2 = table.Get(1) as TestData;

        Console.Write($"{data2?.Name}");
    }

       private void LoadAll()
        {
            var table = cache.GetOrCreateTable<int>(TABLE_NAME);

            foreach (PileEntry entry in mmfPile)
            {
                var buf = mmfPile.Get(entry.Pointer);
                var putResult = table.Put(1, buf);

                Console.WriteLine($"{putResult}");
            }
        }
}

public class TestData
{
    public int Id { get; set; }
    public string Name { get; set; }
}

I can see that MMP file is created in the root of my app as a folder and 2 files are created. When I run the app and call Write() and then Read() object is inserted and read succesffuly. When I close the app and run only the read again data is null.

Can anyone advice why is not saving to disk?

zhabis commented 1 year ago

Cache is in-mem only and not saved into file. After start you can reload all items from pile into cache like foreach pile.AllItems

pantonis commented 1 year ago

I updated the code above with the LoadAll() method and I get the following exception:

Exception in SlimSerializer.Deserialize():  [Azos.Serialization.Slim.SlimInvalidTypeHandleException] TypeRegistry[handle] is invalid: 54"

Stacktrace:

   at Azos.Serialization.Slim.TypeRegistry.get_Item(VarIntStr handle)
   at Azos.Serialization.Slim.TypeSchema.DeserializeRootOrInner(SlimReader reader, TypeRegistry registry, RefPool refs, StreamingContext streamingContext, Boolean root, Type valueType)
   at Azos.Serialization.Slim.SlimSerializer.deserialize(SlimReader reader, RefPool pool)
   at Azos.Serialization.Slim.SlimSerializer.Deserialize(Stream stream)
itadapter commented 1 year ago

Do you have code on Git?

pantonis commented 1 year ago

What do you mean?

itadapter commented 1 year ago

We need to see your code to try help you with the issue. Looks like you have serialized something and then changed the type. If you need to store data for a long time, then dont use Slim, use other format such as Bixon and store your data as Byte[] in pile which will still be blazingly fast

Without knowing what you are trying to store in pile/cache it is very hard to help

pantonis commented 1 year ago

@itadapter The above is my code. I don't have any other code. I added the code in the first post of this issue

pantonis commented 1 year ago

@itadapter Any update on this?

itadapter commented 1 year ago

I do not see deterministic finalization with Dispose. Because you are using complex OS-related unmanaged resources, you must finalize all service deterministically by calling "pile.Dispose()" etc...

looks like you are corrupting your data by tearing it when process terminates

itadapter commented 1 year ago

Ok, i looked at your code. There are 2 issues.

1 - Deterministic dispose

You must determinsitically stop cache then pile (and dispose) You might have files which are not fully written (because you terminated the process)

2 - type registry corruption

Because you are using custom type, the slim serializer does not know what type it is on manual read-back from pile after process restart.

There are many things you can do. Complex and simple. The simplest would be store not TestData but byte[] of testData. I am not able to advise on the easiest method because I dont know what you need to use the cache for.

can you please explain what this cache is for (actual objects), so i can suggest the approach whcih is basically one liner but I dont know which serializer to suggest.

The fastest way to do this would be useing Bix formatter, Bixon, or Json. All of them are built-in

pantonis commented 1 year ago

First of all I want to thank you for your reply. I see a great potential in this library and I really don't understand why this library is not so popular. I mean it is unique as it is the only lib I found that truly unlocks the .NET potential as a high perf language. MS could use this library for sure to improve lots of their missing features.

Now regarding my scenario I'm using the library as a cache for a data warehouse project where ETL jobs run all the time and instead of saving data to database and retrieve them every minute or so I need an ultra fast way of saving/delete millions of objects with persistence (thus that is why I tried Memory Mapped File Pile)

I couldn't find a complete workable example apart from several snippets.

Thank you

itadapter commented 1 year ago

Ok, I think I can help you a lot because we are actually dealing with ETL warehousing as we speak.

1. Facts

Facts are analytics/wh primitives purposely built for analysis. Azos provides full solution including storage. I think I am going to mention this for you. It will be easier to show to you than type all of this here. Long story short. Look at the "Azos.Log.Fact" class for ultra-optimized format storing hunderds of millions of "facts". They have ad-hock "dims" and "metrics" If you use Facts, you can also use archiving, hich is built for storage and streaming of millions of records a second. Look in "Azos.IO.Archiving" - this is like Apache Parquet only faster and better.

Also Look at "Bixon" - binary serializer a kin to JSON which uses BIX format (see in code).

2. Custom type roundtrip

You do not need to use facts described above. If you do - you get benefits. If you dont - you can still store a PILE of strings or byte[]. If your objects are very different - you can use Bixon with Polymorphism.

I need to get on a call with you as there is lots of typing. But all of this code is in prod. and handling millions of rows.

Look me up in Skype. @itadapter