boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.84k stars 1.84k forks source link

How can I store 30.40 in DynamoDb? #665

Open jonapich opened 8 years ago

jonapich commented 8 years ago

It's the end of the day, maybe i'm not seeing this clearly... But I am unable to store simple float values without adding some arcane magic to the mix.

Using .put_item on a Table resource that contains a float:

item = {'name': 'testing_row', 'foo': 30.40}
table.put_item(Item=item)
>> TypeError: Float types are not supported. Use Decimal types instead.

The same thing, using Decimal:

item = {'name': 'testing_row', 'foo': Decimal(30.40)}
table.put_item(Item=item)
>> Inexact: None

This last one should have gone through, no? The stack trace is this:

>       table.put_item(Item=item)

E:\Projects\Repos\tests\test_dynamodb.py:9: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Python27\lib\site-packages\boto3\resources\factory.py:518: in do_action
    response = action(self, *args, **kwargs)
C:\Python27\lib\site-packages\boto3\resources\action.py:83: in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
C:\Python27\lib\site-packages\botocore\client.py:258: in _api_call
    return self._make_api_call(operation_name, kwargs)
C:\Python27\lib\site-packages\botocore\client.py:524: in _make_api_call
    api_params, operation_model, context=request_context)
C:\Python27\lib\site-packages\botocore\client.py:574: in _convert_to_request_dict
    params=api_params, model=operation_model, context=context)
C:\Python27\lib\site-packages\botocore\hooks.py:227: in emit
    return self._emit(event_name, kwargs)
C:\Python27\lib\site-packages\botocore\hooks.py:210: in _emit
    response = handler(**kwargs)
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:197: in inject_attribute_value_input
    'AttributeValue')
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:252: in transform
    model, params, transformation, target_shape)
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:259: in _transform_parameters
    model, params, transformation, target_shape)
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:274: in _transform_structure
    target_shape)
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:259: in _transform_parameters
    model, params, transformation, target_shape)
C:\Python27\lib\site-packages\boto3\dynamodb\transform.py:283: in _transform_map
    params[key] = transformation(value)
C:\Python27\lib\site-packages\boto3\dynamodb\types.py:103: in serialize
    return {dynamodb_type: serializer(value)}
C:\Python27\lib\site-packages\boto3\dynamodb\types.py:204: in _serialize_n
    number = str(DYNAMODB_CONTEXT.create_decimal(value))
C:\Python27\lib\decimal.py:3938: in create_decimal
    return d._fix(self)
C:\Python27\lib\decimal.py:1712: in _fix
    context._raise_error(Inexact)

I can make it go through if I use the string trick in the Decimal constructor:

item = {'name': 'testing_row', 'foo': Decimal('30.40')}
table.put_item(Item=item)

Having done that, using AWS's dashboard to look into the table, I can see a number type with the value of 30.4 but then I cannot assert its value in my tests:

item = table.get_item(Key={'name': 'testing_row'})['Item']['foo']
assert item == 30.4
>> False

This seems to work:

assert float(item) == 30.4

So how's this supposed to work exactly? Is it expected that I must 1) provide a string to decimal and 2) convert back to float myself for equality to work properly?

jonathanwcrane commented 8 years ago

Yeah I've gotten this weird "inexact" error before. I think what I did was cast a number to decimal earlier on in the calculation, then I rounded it to two decimals points AFTER I had cast it, and it worked.

¯(ツ)

jonapich commented 8 years ago

@jonathanwcrane but in my case i'm not doing any calculations. Boto3 forces me to convert my floats to Decimal, which I do, and then it is unable to push it... If you take a look at the stacktrace, you'll notice boto tries to do some funky stuff with my Decimal... this sounds like a bug.

kyleknap commented 8 years ago

@jonapich You just need to enclose the float value with quotes. So something like this:

item = {'name': 'testing_row', 'foo': Decimal('30.40')}
table.put_item(Item=item)

Passing it as a string is the typical usage for Decimal in python: https://docs.python.org/2/library/decimal.html.

Let me know if that helps. I think an example should be added for this.

jonathanwcrane commented 8 years ago

So let's say it's a value that's calculated on the fly. Do we cast it as a string, thusly:

our_value = some_other_value / yet_another_value
item = {'name': 'testing_row', 'foo': Decimal(str(our_value))}
table.put_item(Item=item)

?

jonapich commented 8 years ago

@kyleknap did you look at the example I provided after the stack trace, in my original post?

I don't agree with this behavior being normal. Having to wrap these calls with home-made recursive conversion functions goes against the principle that you can usually just pass a boto resource around. It should be usable as-is.

I understand that python floats are annoying and unprecise, but boto really should handle the extra step of handling these conversions for the user. I agree that it shouldn't allow Inexact conversions, but for this it would be better to allow the user to attach his own conversion function when Inexact is raised.

jonapich commented 8 years ago

@jonathanwcrane the problem with this approach is that it also has to be converted back if you pull it:

>>> val1 = Decimal(30.40)
>>> val2 = Decimal('30.40')
>>> val1 == 30.40
True
>>> val2 == 30.40
False
>>> float(val1) == 30.40
True
>>> float(val2) == 30.40
True
csimmons0 commented 8 years ago

Hi. I was hitting the same error and came up with this solution. If you're okay with rounding, you may find it helpful.

def round_float_to_decimal(float_value):
    """
    Convert a floating point value to a decimal that DynamoDB can store,
    and allow rounding.
    """

    # Perform the conversion using a copy of the decimal context that boto3
    # uses. Doing so causes this routine to preserve as much precision as
    # boto3 will allow.
    with decimal.localcontext(boto3.dynamodb.types.DYNAMODB_CONTEXT) as \
         decimalcontext:

        # Allow rounding.
        decimalcontext.traps[decimal.Inexact] = 0
        decimalcontext.traps[decimal.Rounded] = 0
        decimal_value = decimalcontext.create_decimal_from_float(float_value)
        g_logger.debug("float: {}, decimal: {}".format(float_value,
                                                       decimal_value))

        return decimal_value
robolivable commented 7 years ago

@jonapich +1, I totally agree with this. I shouldn't have to be yelled at for using floats in my values. And if they must be used, the library should be responsible for handling the conversions, not me.

What seems off here as well, is that this isn't an issue when using the DynamoDB.Client class approach for saving items.

bittlingmayer commented 7 years ago

We store a nested object, and don't want to write code that assumes a specific nesting or recursively searches for floats deep down in the tree. So for us the lazy workaround is to use json.dump and store the entire dict as string.

Alonreznik commented 7 years ago

I've solved that issue by creating a json.loads object hook, which can make that conversion from decimal to float.

from dateutils import parser
from boto3.dynamodb.types import TypeDeserializer, TypeSerializer

def encode_object_hook(dct):
    try:
        return parser.parse(TypeDeserializer().deserialize(dct))
    except (ValueError, AttributeError, TypeError):
        try:
            val = TypeDeserializer().deserialize(dct)
            if isinstance(val, Decimal):
                if val % 1 > 0:
                    return float(val)
                elif val < maxint:
                    return int(val)
                else:
                    return long(val)
            else:
                return val
        except:
            return dct

For "dumps" your dict into DynamoDB put_item valid value, you can use this serializer:

def decode_object_hook(dct):
    for key, val in dct.iteritems():
        if isinstance(val, float):
            dct[key] = Decimal(str(val))
        try:
            dct[key] = TypeSerializer().serialize(val)
        except:
            dct[key] = val
    return dct

def json_serial(val):
    if isinstance(val, datetime):
        serial = val.strftime('%Y-%m-%dT%H:%M:%S.%f')
        return serial
    elif isinstance(val, set):
        serial = list(val)
        return serial
    elif isinstance(val, uuid.UUID):
        serial = str(val.hex)
        return serial

def dumps(dct, *args, **kwargs):
    kwargs['object_hook'] = decode_object_hook
    return json.loads(json.dumps(dct, default=json_serial), *args, **kwargs)

item = dumps({'name': 'testing_row', 'foo': 30.40})
table.put_item(Item=item)
agarwalvipin commented 6 years ago

is there any update on this issue/feature request?

APIZone commented 6 years ago

What worked for us is wrapping the floating point value into str and casting to Decimal, no loss of precision!

transaction_amount = 100.03

item = { 'subject_bank_xref': ext_reference, 'transaction_amount': Decimal(str(transaction_amount)) }

tarak1992 commented 5 years ago

This below implementation will solve the issue. Here each_item is the JSON Object from decimal import Decimal each_item_dump = json.dumps(each_item) each_item = json.loads(each_item_dump, parse_float=Decimal)

frankleonrose commented 4 years ago

@Alonreznik The following condition in your code results in negative floating point values being unrecognized as floats.

if val % 1 > 0:
    return float(val)

you need to use something along the lines of

if d <> d.to_integral_value():
    return float(val)
yehorb commented 4 years ago

Most of the proposed solutions around the internet are either json.loads(json.dumps(value), parse_float=Decimal) or writing own recursive type caster. json comes with an enormous performance overhead. Writing your recursive type caster seems doable, but you need to WRITE your type caster, and dynamodb resource walks the object recursively on write anyway.

Instead, you can patch the existing TypeSerializer to do what you need it to do:

from boto3.dynamodb import types
class FloatSerializer(types.TypeSerializer):
    # This one here raised Type error on floats.
    # By removing this error, we can work with floats in the first place.
    #
    # Original code uses six.integer_types here.
    # I work with 3.6 on this project, so I can omit six altogether.
    def _is_number(self, value):
        if isinstance(value, (int, decimal.Decimal, float)):
            return True
        return False

    # Add float-specific serialization code
    def _serialize_n(self, value):
        if isinstance(value, float):
            with decimal.localcontext(types.DYNAMODB_CONTEXT) as context:
                context.traps[decimal.Inexact] = 0
                context.traps[decimal.Rounded] = 0
                number = str(context.create_decimal_from_float(value))
                return number

        number = super(FloatSerializer, self)._serialize_n(value)
        return number

    # By the way, you can not write dictionaries with int/float/whatever keys as is,
    # boto3 does not convert them to strings automatically.
    #
    # And DynamoDB does not support numerical keys anyway,
    # so this crude workaround seems reasonable.
    def _serialize_m(self, value):
        return {str(k): self.serialize(v) for k, v in value.items()}

import boto3
from unittest.mock import patch

session = boto3.session()

# TypeSerializers are created on resource creation, so we need to patch it here.
with patch("boto3.dynamodb.types.TypeSerializer", new=FloatSerializer):
    db = session.resource("dynamodb")

Now, any Tables created from db.Table() will support float values, without writing your own recursive patching code. This technique applies for the TypeDeserializer as well, so if you need to tune get_item() response types, you can do it in the same manner.

Hope somebody will find this approach useful.

Sidenote. I understand, that DynamoDB supports 38 digits of precision for numerical types, and I understand, that float is not the right tool for handling 38 digits of precision. But c'mon, I do basic division in my code and explicitly round results to 2 decimal digits, because it's enough for my use case. Writing this much code and using such engineering feats just to put something like 1.75 into the database - seems a bit unreasonable. I understand that Amazon engineers need to flex their digits of precision from time to time, but sometimes it may lead to overengineering and a hard time for end-user. Adding native float support even with rounding and stuff would help greatly.

tallesl commented 4 years ago

@yehorb

I went with the recursive type caster but patching the serializer seems better.

Have you used such code in production? Would you consider making a package of it (or let myself do it)?

yehorb commented 4 years ago

@tallesl

I believe it's pretty stable. I use this patching technique in a few projects, and it seems to work fine. Never got the issue from the patching itself. patch is pretty reliable, even though such usage is 'unintended'.

The Serializer itself works fine too. Maybe it requires some polish and backward compatibility, but, again, I use 3.6 or 3.7 in my projects, so I went straight ahead for modern type definitions.

No, I'm not planning to turn this code into a separate package, you can do it by yourself if you want. I believe this feature neatly packaged may be helpful for the boto3 users.

What I'm planning to do is research into boto3.dynamodb codebase a bit deeper and make it possible to pass your own Serializer/Deserializer objects/constructors on resource creation. The feature should be relatively easy to implement, TypeSerializer/Deserializer created only once in the beginning and then reused.

Such a solution may not be optimal but will solve the common problems of getting 'officially not supported types' in and out of DynamoDB with relatively low boilerplate code.

tallesl commented 4 years ago

@kyleknap

Any progress on this?

It's mind boggling how boto doesn't allow floats (python's default floating type) out of the box. The most trivial put_item call raises this issue, not to mention the pain of using decimals with json.dumps and json.loads.

syzhakov commented 3 years ago

I spent a whole day trying to figure out how to solve this issue without success. Unbelievable it isn't solved yet, such a basic usecase.

mojimi commented 3 years ago

boto3 has to be the less pythonic library ever created

mdavis-xyz commented 3 years ago

This issue is annoying, but that doesn't mean it's not a pythonic library, let alone the least pythonic library.

An example of a less pythonic library is the Azure SDK. Boto returns dictionaries for most calls. But the Azure SDK returns nested custom classes. Each one is specific to exactly that one field nested within that one API call. If you print the response you just see <SomethingResponse> instead of the dictionary, and you can't json.dumps() the response.

dacevedo12 commented 2 years ago

Any updates? This is indeed such a simple case you would expect boto to handle. Just like the other issue where it didn't support tuples

adityamcodes commented 2 years ago

Any updates? This is something Boto3 should just handle.

jonapich commented 2 years ago

@adityamcodes I don't know why they're keeping this issue opened, but it's been more than 8 years now. I don't think it will ever be done. There are tons of workaround explained here and in #369.

andrewjroth commented 8 months ago

Hello @tim-finnigan and @RyanFitzSimmonsAK .... I see my PR #2699 was closed but I didn't really get a response to my proposal. I don't really understand why this got a "needs major version" if it is implemented as I explained in my comment quoted below.

As I mention, I am happy to do the PR work, but want to confirm that it will be considered before I start spending time on it.

I see the primary concern is a loss of precision by new users who may unknowingly expect better precision from float values than the data type is actually capable of. The problem with not being able to delete items only applies when a float value is used as an index key. Is this right?

For users who are aware of this limitation and are willing to accept it, would it be reasonable to add a flag for the DynamoDBHighLevelResource to allow floats? This would require the user to accept the risk of using floats when creating the resource interface.

I am thinking the user code would look something like this:

import boto3

dynamodb = boto3.resource('dynamodb', allow_floats=True)
table = dynamodb.Table('name')
response = table.put_item(
    Item={ "data1": { "data-level2": 0.938 } }
)

This might take a bit of work, so I wanted to ask if this was acceptable or not before starting on it. The default implementation would, of course, be reverted back to an error when floats are serialized.

Thanks for your feedback!

d3cxxxx commented 3 months ago

I am using Python 3.X and boto3 still doesn't seem to be supporting this functionality. Would have loved to store the inputs as is in the DB, but am being forced to convert them to strings.

When is a fix expected for it? It's been like 7 years now since the issue was raised.

pauljones0 commented 1 month ago

Bumping this up. Same problem, same headache over the same issue. Killed 3 hours, trying to find what was going on