amazon-ion / ion-python

A Python implementation of Amazon Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
260 stars 50 forks source link

The amazon.ion.simple_ion.dumps method output doesn't work with DynamoDB import table #363

Open MacHu-GWU opened 1 month ago

MacHu-GWU commented 1 month ago

I am trying to generate ion data file manually using this library so that I can use it for DynamoDB import table, this is my DynamoDB item in python dictionary.

{
        "id": 1, # this is hash key
        "name": "Alice"
}

the amazon.ion.simple_ion.dumps method gives me: $ion_1_0 {Item:{id:1,name:"Alice"}}, note that there's no dot after number 1. Then the import_table API fails.

However, if I manually add the dot behind the number 1, making it to be $ion_1_0 {Item:{id:1.,name:"Alice"}}, then it works.

I also tried to export a manually createdDynamodb table and I found out that the export ION file has the dot after the integer number.

I also tried the loads method, I think the integer without dot is a valid value for deserialization. However, it doesn't work with DynamoDB table import.

How do I ensure that there's an dot after any integer in the text view of my data?

rmarrowstone commented 1 month ago

This isn't really a bug with ion-python.

In the Ion text format 1 is an Integer and 1. is a Decimal, see: https://amazon-ion.github.io/ion-docs/docs/spec.html

Per the DynamoDB Import Docs: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/S3DataImport.Format.html#S3DataImport.Requesting.Formats.Ion

They import an Ion Decimal as a Dynamo DB Number. I do not know why they don't map an Ion Integer to a Dynamo DB Number, but they don't. That's a possible feature request for Dynamo DB.

Assuming that it's faster to change your code then get DynamoDB to change, and that your code block is your python code: To serialize an Ion Decimal from Python you need to create a decimal.Decimal. That will emit in your Ion stream as an Ion Decimal.

See https://github.com/amazon-ion/ion-python/blob/master/amazon/ion/simpleion.py#L33

rmarrowstone commented 1 month ago

I would further advise that for your production code you serialize your Ion as Binary for the imports. Obviously the text format is great for debugging and developing, but the binary format has improved data density and will be faster to import.

Please check out the pydoc in simpleion and let us know how we can improve that if needed.

MacHu-GWU commented 1 month ago

Thanks @rmarrowstone .

I believe it is still a bug, but not in amazon ION python, it is actually about DynamoDB Import.

The simpleion.dumps() method gives you the correct value $ion_1_0 {Item:{id:1,name:"Alice"}} (I expect the id to be integer). However, the DynamoDB import table feature doesn't recognize it. In my TableCreationParam, I defined the attribute type is N, however, DynamoDB import table feature raises an error for that.

@rmarrowstone another issue is that the DynamoDB import table document didn't mention how to use ion binary format to prepare the data. And the document says that Items in an Ion file are delimited by newlines. Each line begins with an Ion version marker, followed by an item in Ion format., which implies that I should use text to code my data. Then how can I do this?

I would further advise that for your production code you serialize your Ion as Binary for the imports. 
rmarrowstone commented 1 month ago

Sadly it does look like they only support the Text format, so that was some bad advice, sorry. It would be more optimal if they supported the binary format, but alas...