fsprojects / FSharp.Azure.Storage

F# API for using Microsoft Azure Table Storage service
MIT License
75 stars 16 forks source link

Partition key and row key are stored in addition to record fields #3

Closed Bananas-Are-Yellow closed 9 years ago

Bananas-Are-Yellow commented 9 years ago

I tried your example:

type Game = 
    { [<RowKey>] Name : string
      [<PartitionKey>] Developer : string
      HasMultiplayer : bool
      Notes : string }

let game = 
    { Name = "Halo 4"
      Developer = "343 Industries"
      HasMultiplayer = true
      Notes = "Finished the game in Legendary difficulty." }

let result = game |> Insert |> inGameTable

I was expecting the Name and Developer fields to be stored as PartitionKey and RowKey properties in the table instead, but they are stored as Name and Developer properties as well, which means these properties are stored twice.

Is there some way to avoid this?

daniel-chambers commented 9 years ago

Currently, no. It was a design decision to leave the properties used as the PK and RK also as actual properties, in order to make the row as descriptive as possible (eg. so you can look at the row in isolation and see what the Name and Developer are; if they were only stored as the PK and RK, you lose the ability to tell which is which just by looking at column names).

Another consideration is that beyond trivial examples, the PK and RK are often derived from a subset or combination of other properties (using IEntityIdentifiable), usually in such a way to allow you to efficiently search the table. In these cases, you still need Name and Developer persisted explicitly. When using the attributes, from the code's perspective the PK and RK just incidentally happen to be the same as Name and Developer.

One workaround with the current build would be to rename Name and Developer as PartitionKey and RowKey in the record type, that way the names lines up; admittedly this is a crap workaround. :)

In a future release, I want to add the ability to mark properties are ignorable; this would effectively give you what you want, as you could mark Name and Developer as Ignored and then they wouldn't be separately persisted.

Bananas-Are-Yellow commented 9 years ago

Naturally I am hoping that my application will be a runaway success, and millions of people will be creating tons of data in Azure. Therefore the storage cost of duplicating data unnecessarily is a concern. The readability of column names in my storage explorer for me as a developer is not so important.

You said "beyond trivial examples". Actually, I'd say that beyond trivial examples, the idea that your application's in-memory F# record types can be persisted in Azure table storage rapidly breaks down for two reasons:

  1. Fields can only be of types supported by Azure. In practice, in-memory fields are likely to be more complex types than that.
  2. The simplicity of Azure table storage means that in-memory data structures need to be represented in a different way in table storage. For example, my in-memory TypeA might have one-to-many references to objects of TypeB, which I might store in an array of references. In table storage, if I know this array will always be quite short (say, length < 20), I could pack keys of TypeB entries into a string property with separators, but even this is now different from my array of references. Then if the array could be arbitrarily long (say, length > 1000), then I will have to store TypeA keys inside TypeB entries instead, and then query for all the TypeB objects related to my TypeA object.

Therefore, I accept that when I want to persist my in-memory data structures, I will first have to convert them to Azure-compatible types, and then persist those types instead.

Renaming Name and Developer to be PartitionKey and RowKey does not work. It produces an exception: The type XXX does not contain a property with PartitionKeyAttribute.

I just tried using DynamicTableEntity, and this did seem to work. Are there drawbacks of using this approach, apart from the possibility of coding errors due to properties not being type safe? Will all your modify and query functions work fine?

When do you think you might support ignorable properties? That sounds like an elegant solution.

Another thought that occurred to me is that IEntityIdentifiable could support unpacking the keys back into fields too. Then the IEntityIdentifiable approach could allow properties to be ignored too.

daniel-chambers commented 9 years ago

I think if you're going to have record types that directly represent data that has been transformed to table storage-specific format, then you should use the PartitionKey and RowKey named properties on the record type. With respect to the exception you're seeing, you still need to apply the PK and RK attributes to those properties, even after the name change.

ITableEntity types like DynamicTableEntity should work fine, though I wouldn't recommend them simply because they are necessarily mutable types, and when programming functionally I prefer to eliminate as much mutability as possible, hence using F# record types. Also, ITableEntity-implementing classes don't get the other benefits of F# record types, such as automatic equatability and comparability.

Bananas-Are-Yellow commented 9 years ago

For some reason, when I tried applying the attributes to the PartitionKey and RowKey fields yesterday, I got another exception, which said something about a "Conflict". Perhaps there was an existing entry with those keys. Anyway I tried it again just now and it works, so that's good.

Thanks, that's what I shall do.

By the way, I really appreciate that you have created this library. It's well designed and useful.

daniel-chambers commented 9 years ago

No worries, glad I could help. :)

It's good to see people presenting real use cases like you have here; helps me see where new features (like ignore) might be useful for people.