Use of nvarchar on AggregateId in EventStores.MsSql could impact indexing

JC008 commented 9 years ago

Is unicode needed in the aggregate id and should it be 255 long?

Using unicode in an indexed field does have an impact on the available key space in a data page in sql server which could result in performance issues when handling huge volumes of event data in the event store.

rasmus commented 9 years ago

@JC008 Valid point. Any suggestions on how to create a SQL migrate script that could be executed on a running system? We use the MSSQL store in production and thus it would be critical that the operation could be performed without any significant downtime.

JC008 commented 9 years ago

I would consider calculating a new AggregateIdHash field for the aggregate id and using it as the base for IX_EventFlow_AggregateId_AggregateSequenceNumber thereby leaving AggregateId intact and allowing for a graceful switch-over.

JC008 commented 9 years ago

@rasmus FYI http://stackoverflow.com/questions/32271518/append-only-to-table-in-sql-server-to-record-immutable-events-and-improve-overal

rasmus commented 9 years ago

I read the article which basically sums up to this

Ensure ALLOW_ROW_LOCKS is ON
Ensure ALLOW_PAGE_LOCKS is ON
Write SQL for inserting and selecting based on multiple factors including CPU, memory, network

For 1. and 2. ON is the default for the database. As for using varchar instead of nvarchar I'll keep it as it is. I understand the concern for wanting to limit the amount of storage space and memory used, but I don't think limiting the character set for aggregate IDs is a good idea for the default database schema. As you wrote, migrating a live event source is problematic and limiting the character set that you can use for aggregate root IDs isn't worth it. Developers might e.g. at some point want to use e-mail addresses as aggregate root IDs.

I'll keep the nvarchar for the aggregate root IDs, but I do think you have a point regarding performance optimizations in EventFlow. The default behavior of EventFlow should be "good enough", but it should be possible to do performance tweaking. Regarding the SQL schema, you control when, and if, you want to use it and thus you could create your own schema or even your own MSSQL store implementation as its a very few lines of code (see MssqlEventStore). Its important for me that EventFlow "just works" and handles CQRS+ES task very well, without too many surprises whan you start using it.

Again, your point is valid and I suggest to create a "performance tips" guide to EventFlow with specifics on what you could do to improve performance by e.g. imposing limitations on what characters you can put in an aggregate root ID.

JC008 commented 9 years ago

I asked the question on SO on Friday in the hope that someone could provide some guidance beyond the obvious but is seems there is none as far as SQL Server is concerned. Postgresql (GreenPlum) supports append only storage and in general, append only structured storage will become more prevalent in the future.

More importantly is awareness of the impact of using the nvarchar but more important than that is how flexible the EventFlow approach and implementation is regarding storage. I will add it to the list of articles I am writing for the EventFlow Guide.

rasmus commented 9 years ago

@JC008 Thanks for investigating it and I appreciate the effort, I'll run it by our SO team as well just to be sure.

eventflow / EventFlow

Use of nvarchar on AggregateId in EventStores.MsSql could impact indexing #107