NEventStore / NEventStore.Persistence.MongoDB

Mongo Persistence Engine for NEventStore
MIT License
22 stars 26 forks source link

Is there any way to compress the data while using mongo persistence with NEventStore? #62

Open josevfrancis opened 2 years ago

josevfrancis commented 2 years ago

I'm working with C#, Dotnet core, and NeventStore( version- 9.0.1), trying to evaluate various persistence options that it supports out of the box.

More specifically, when trying to use the mongo persistence, the payload is getting stored without any compression being applied.

Note: Payload compression is happening perfectly when using the SQL persistence of NEventStore whereas not with the mongo persistence.

I'm using the below code to create the event store and initialize:

private IStoreEvents CreateEventStore(string connectionString) { var store = Wireup.Init() .UsingMongoPersistence(connectionString, new NEventStore.Serialization.DocumentObjectSerializer()) .InitializeStorageEngine() .UsingBsonSerialization() .Compress() .HookIntoPipelineUsing() .Build(); return store; }

And, I'm using the below code for storing the events:

public async Task AddMessageTostore(Command command) { using (var stream = _eventStore.CreateStream(command.Id)) { stream.Add(new EventMessage { Body = command }); stream.CommitChanges(Guid.NewGuid()); } }

The workaround did: Implementing the PreCommit(CommitAttempt attempt) and Select methods in IPipelineHook and by using gzip compression logic the compression of events was achieved in MongoDB.

Attaching data store image of both SQL and mongo persistence: MicrosoftTeams-image (12)

MicrosoftTeams-image (11)

So, the questions are:

Is there some other option or setting I'm missing so that the events get compressed while saving(fluent way of calling the compress method)? Is the workaround mentioned above sensible to do or is it a performance overhead?

josevfrancis commented 2 years ago

@AGiorgetti @andreabalducci @Iridio Can you guys at least share your initial thoughts? We're really gated with this. Thanks.

AGiorgetti commented 2 years ago

Hi @josevfrancis , you can try this:

let me know if it solved your problem.

andreabalducci commented 2 years ago

imho is useless unless you want to encrypt. Just enable compression for mongo (https://www.mongodb.com/blog/post/new-compression-options-mongodb-30)

josevfrancis commented 2 years ago

Hi @josevfrancis , you can try this:

  • replace DocumentObjectSerializer with ByteStreamDocumentSerializer an IDocumentSerializer allows to use custom serialization on each event payload (DocumentObjectSerializer is a no-op serializer).
  • pass an instance of GzipSerializer to the ByteStreamDocumentSerializer (take a look at NEventStore serialization tests).

let me know if it solved your problem.

Hi @AGiorgetti

Thanks for the quick response. We tried to replace DocumentObjectSerializerwith ByteStreamDocumentSerializer while passing in a new GzipSerializer(new BinarySerializer()). This resulted in a "binaryformatter serialization and deserialization are disabled within this application" error.

Please let me know if I'm doing something wrong.

josevfrancis commented 2 years ago

imho is useless unless you want to encrypt. Just enable compression for mongo (https://www.mongodb.com/blog/post/new-compression-options-mongodb-30)

Hi @andreabalducci

Thanks for the quick response. I'm just being loud and stupid with my questions here.

I understand that we can use the MongoDB compression options but can't we save storage space if we try to save data that is already compressed? BTW, I'm trying to compress the data by adding my compression logic within the PreCommit() method of the IPipelineHook.

Apart from that, when trying out SQL as commits persistence, with SerializationWireup.Compress() method we were able to reduce the size of the payload (It got reduced to 50% compared to uncompressed data). We are trying to replicate the same with MongoDB as a persistence. Note: Our commits table may grow up significantly so thinking of optimizing the size as much as possible.

Are there any best practices around the same?

AGiorgetti commented 2 years ago

Hi @josevfrancis , you can try this:

  • replace DocumentObjectSerializer with ByteStreamDocumentSerializer an IDocumentSerializer allows to use custom serialization on each event payload (DocumentObjectSerializer is a no-op serializer).
  • pass an instance of GzipSerializer to the ByteStreamDocumentSerializer (take a look at NEventStore serialization tests).

let me know if it solved your problem.

Hi @AGiorgetti

Thanks for the quick response. We tried to replace DocumentObjectSerializerwith ByteStreamDocumentSerializer while passing in a new GzipSerializer(new BinarySerializer()). This resulted in a "binaryformatter serialization and deserialization are disabled within this application" error.

Please let me know if I'm doing something wrong.

Hi @josevfrancis , the BinaryFormatter was deprecated long time ago by the .NET team. The current BinarySerializer implementation still uses the old BinaryFormatter (still there for testing purpose and because I'm too lazy to replace it), you should be able to implement your own ISerialize interface that reads and writes bytes to a Stream, it's pretty straightfoward.

josevfrancis commented 2 years ago

reads and writes bytes to a Stream

Hi @AGiorgetti,

I have now replaced the BinarySerializer with a custom serializer having the below methods:

public virtual void Serialize<T>(Stream output, T graph)
        {
            using (StreamWriter streamWriter = new StreamWriter(output, Encoding.UTF8))
                this.Serialize((JsonWriter) new JsonTextWriter((TextWriter) streamWriter), (object) graph);
        }
protected virtual void Serialize(JsonWriter writer, object graph)
        {
            using (writer)
                _serializer.Serialize(writer, graph);
        }

And used it like ByteStreamDocumentSerializer(new GzipSerializer(new CustomSerializer()))

But, While adding the events to stream, I am getting "Unable to cast object of type 'System.Byte[]' to type 'NEventStore.EventMessage" error.

Please suggest if something is wrong.

josevfrancis commented 2 years ago

reads and writes bytes to a Stream

Hi @AGiorgetti,

I have now replaced the BinarySerializer with a custom serializer having the below methods:

public virtual void Serialize<T>(Stream output, T graph)
        {
            using (StreamWriter streamWriter = new StreamWriter(output, Encoding.UTF8))
                this.Serialize((JsonWriter) new JsonTextWriter((TextWriter) streamWriter), (object) graph);
        }
protected virtual void Serialize(JsonWriter writer, object graph)
        {
            using (writer)
                _serializer.Serialize(writer, graph);
        }

And used it like ByteStreamDocumentSerializer(new GzipSerializer(new CustomSerializer()))

But, While adding the events to stream, I am getting "Unable to cast object of type 'System.Byte[]' to type 'NEventStore.EventMessage" error.

Please suggest if something is wrong.

@AGiorgetti Sorry for disturbing you with back-to-back questions. Can you please let us know your thoughts whenever you have some time?

josevfrancis commented 2 years ago

imho is useless unless you want to encrypt. Just enable compression for mongo (https://www.mongodb.com/blog/post/new-compression-options-mongodb-30)

Hi @andreabalducci

Thanks for the quick response. I'm just being loud and stupid with my questions here.

I understand that we can use the MongoDB compression options but can't we save storage space if we try to save data that is already compressed? BTW, I'm trying to compress the data by adding my compression logic within the PreCommit() method of the IPipelineHook.

Apart from that, when trying out SQL as commits persistence, with SerializationWireup.Compress() method we were able to reduce the size of the payload (It got reduced to 50% compared to uncompressed data). We are trying to replicate the same with MongoDB as a persistence. Note: Our commits table may grow up significantly so thinking of optimizing the size as much as possible.

Are there any best practices around the same?

@andreabalducci @Iridio Do you have any thoughts around the quoted message above?

andreabalducci commented 2 years ago

double compression is a waste of cpu and adds little or no benefits (could be worse). Mongo has his own strategies for space allocation. Enabling compression on Mongo will simplify all your data maintenance, query, diagnostics, management. my2c.

josevfrancis commented 2 years ago

@andreabalducci Thanks for your comments. Your 2 cents 💯 will add a lot of value to people with similar questions.