elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
13 stars 1.15k forks source link

Decimal serialisation #3497

Closed RevoluPowered closed 5 years ago

RevoluPowered commented 5 years ago

In C# we use decimal to represent money, I am just wondering if decimal is stored in elastic as two integers or in a decimal format which is okay for use with currency?

Or alternatively, how is the decimal type stored inside ES?

What's the error rate? etc

Henr1k80 commented 5 years ago

AFAIK, decimals are stored as doubles in the index. When they are deserialized, they are of course decimal again, so you only need to worry about storage precision (unless you have business logic in elastic update scripts, then I guess the precision is double)

RevoluPowered commented 5 years ago

Thanks for your reply, we need more precision in our storage than double could provide for our monetary calculations, I am curious is there a good way of implementing a way of storing decimal to two ints perhaps, instead of to the double type in es?

Henr1k80 commented 5 years ago

I would probably make two properties for the integers with a getter that gets the integer part on serialization. The setters, (only called on deserialization) just sets a backing field. I would mark them [Obsolete("Only for serialization/deserialization", error: true)] to ensure that they are only used to serialize and not from code.

The decimal getter should have some logic to tell whether it should be initialized from the integers (only on first use after deserialization). I would set [Ignore] to avoid any confusion by it being stored and indexed, unless it is needed for sorting where I guess double precision is ok?

RevoluPowered commented 5 years ago

Something similar to this?


public class Currency
{
    public Currency(decimal value)
    {
        Value = value;
    }

    // stored in elastic
    public int pence;
    public int pounds;

    // could be lazy loaded too, to prevent having to do math when returning the value 
    [Ignore]
    public decimal Value{   
    get 
    { 
        // add fractions of a pence to pounds amount in decimal types.
        return ((decimal)pence/100m)+ (decimal)pounds; 
    }
    set { 
        // note: not tested this yet
        pounds = Math.Truncate(value); // pounds / dollars
        pence = value - Math.Truncate(value); // pence / cents value
    }
    }
}

Then I could maybe do some further work on this so instead of having to declare 3 vars, I could have a Currency type? which serializes into elastic safely with the two int's?

So in future it uses something nested like this:,

public class Car
{
     public Currency price = new Currency( 2500.50m );
}

Shame there isn't a tidier way of doing this, I guess though it's better knowing things are all POCO and no special cases exist, then it can be planned for appropriately.

Henr1k80 commented 5 years ago

If you only are storing two decimals (pence), then I am pretty sure that double precision for storage is enough. At least for int pounds. Eg.:

public class Car
{
    [Ignore] // do not store without rounding
    public decimal Price { get; set; }

    [Obsolete("Only for serialization/deserialization, use Price", error: true)]
    [Number("Price")] // named Price in elastic for transparency
    public decimal StoredPrice {
        get { return Math.Round(Price, 2); }
        set { Price = Math.Round(value, 2); }
    }
}

Even simpler, only store the price in pence, if the floating point representation isn't needed in elastic

public class Car
{
    [Ignore] // do not store without rounding
    public decimal Price { get; set; }

    [Obsolete("Only for serialization/deserialization, use Price", error: true)]
    public long PencePrice {
        get { return Price*100L; }
        set { Price = value/100m; }
    }
}
Mpdreamz commented 5 years ago

Love the solutions on this thread, we inherit our decimal serialization behaviour from Json.NET here which has no lossless support:

https://github.com/JamesNK/Newtonsoft.Json/issues/1726

Another solution would be to create a custom json.net serializer that reads and writes longs while also mapping the Price field as scaled_float in Elasticsearch:

https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

This could be something we oughta be considering shipping out of the box support for, thoughts @codebrain @russcam ?

codebrain commented 5 years ago

Currency storage is a little bit of a minefield. In most domains you very quickly get into the situation where you need to store both the currency representation and the ISO code.

https://github.com/dgg/nmoneys is a pretty decent resource on how to do this in .NET - I believe the value type used is decimal (https://github.com/dgg/nmoneys/blob/b9da020206cbdff51ff68bc53f599f68ee1e13df/src/NMoneys/Serialization/Data.net.cs#L39)

I terms of storing in ES... difficult if the data store does not support precise decimal values (128 bit).

I think if we try to work around this in the client we'd very quickly get into situations like:

I think we run the risk of being a little over-opinionated on our default implementation if we place this inside the client code - however, its a perfect candidate for an example serialiser.

RevoluPowered commented 5 years ago

We store huge fractions of a pence unfortunately for what we're doing so 2dp is not enough. we use carbon per kWh, so fractions of a pence is extremely common in the smallest calculation, plus we do the calculation across literally millions of docs. @Henr1k80

I'd love to see elastic fully support currency, obviously it's a headache but probably worth it if you develop a good solution as the code examples above have some overhead which across 1m+ docs would be slow, especially if you're using multiple fields.

@codebrain Perhaps the ISO code wouldn't matter in some systems, could disable it by default and only enable it if someone specifies that the field requires it?

for example in my system we only use GBP for now, anything in USD etc would just be using a currency conversion lookup.

codebrain commented 5 years ago

I would envisage currency support in two stages:

  1. Implement decimal types in ES core.
  2. Build a currency type based on an amalgam of string and decimal.

I'll ping our ES core developers and see where (and if) decimal support is on the roadmap.

@RevoluPowered your best option at the moment, if you want exact precision per individual value, is to store as a long or int where something like 1000000 = $1.000000 and convert accordingly.

jpountz commented 5 years ago

This issue has some context: https://github.com/elastic/elasticsearch/issues/17006. In short aggregations use doubles internally, which makes supporting any data type more accurate than double probably deceptive in practice since values would be casted to the nearest double when aggregating. So we decided against supporting a decimal type.

RevoluPowered commented 5 years ago

@codebrain yeah I will probably go with the long solution as a workaround for now, it would be nice to see something official though, this wrapper we have above is definitely not following the KISS principle.

@jpountz yeah, I get what you're saying but it might be the case that it needs a re-think, I'm fairly sure clients are storing calculations in C# using decimal, and blindly overlooking that its storing as a double in ES core, I think that could potentially go quite wrong.

Overall though, it's a lot of work so I'll leave it to you guys! Thanks for the feedback on this.

russcam commented 5 years ago

There is a section in the mapping documentation about mapping of .NET types to Elasticsearch field datatypes, with an important admonition about decimal type. This came about from discussion on https://github.com/elastic/elasticsearch-net/issues/2463

RevoluPowered commented 5 years ago

Just had a double precision issue in our aggregations and one of our configurations for our devices.

Is ES Core going to put decimal on the roadmap? if not I might have to make something down the custom serializer route for decimal.

If that is the case is there any way to handle anything other than double in aggregations, I assume it handles long, int but would a custom serializer function in aggs?

russcam commented 5 years ago

@RevoluPowered there isn't a way to plug your own serialization into aggregations in NEST; you can control the complete serialization/deserialization with the low level client, Elasticsearch.Net. Here's an example

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionConfiguration(pool);
var client = new ElasticLowLevelClient(settings);
var response = client.Search<StringResponse>(PostData.Serializable(
    new {
        size = 0,
        aggs = new 
        {
            genres = new
            {
                terms = new
                {
                    field = "genre"
                }
            }
        }
    }
));

var jObject = JsonConvert.DeserializeObject<JObject>(response.Body);

This returns the response from Elasticsearch as a string, then deserializes to a Json.NET JObject.

I'm going to close this issue for now as there isn't anything further actionable in the client that we can do for decimal serialization.