aws / aws-sdk-ruby-record

Official repository for the aws-record gem, an abstraction for Amazon DynamoDB.
Apache License 2.0
319 stars 42 forks source link

Support for binary types? Normal Marshal classes? #37

Open cobbr2 opened 8 years ago

cobbr2 commented 8 years ago

We have some information we need to encrypt in our dynamodb store.... It appears there are no attr methods for declaring objects that will be of type B in Dynamo.

I'm trying to build an appropriate marshaler (sic), but I'm also a bit stymied by the unfamiliar abstract interface (compared to say, Ruby's Marshal class, AR's serialization support, or Data Mapper's dm-types) for these in this library. In what way does type_cast imply "read from database representation"? And why is it always called, even when writing to the database representation?

If it has to work this way, what class can my marshaler assume the raw data is in when read?

awood45 commented 8 years ago

You're correct that we haven't added binary attributes yet, that's an item on our backlog. If you can expand a bit on your use of binary attributes (though I can understand if you can't), would be happy to consider that as well when implementing them.

The type_cast methods are taking possible inputs and turning them into a single representative type, while the serialization step just focuses on differences between that type and DB representation (for example, where Date(Time) attributes are backed by strings).

I'm open to feedback on cases where this implementation falls short or creates issues, our primary source for the marshaler logic was our previous SDK major version.

cobbr2 commented 8 years ago

Sure. I want an attribute that can take a String with encoding ASCII-8BIT and store its value, roughly no matter how long it is. I'm going to provide the symbol for that attribute to a DSL which then sets and gets from it; I have little control over that DSL, and am only lucky that it is encoding the string as ASCII-8BIT consistently. In this particular case, it's going to represent a per-instance encryption key which will then be used to serialize & deserialize other attributes in the same model.

That attribute value needs to be preserved as-is, of course. One approach: type_cast changes String to encode64 strings; serialize puts them out as-is. But now when I read a value back, I get double encodes and no place to decode. Clearly the wrong answer.

Second approach: type_cast requires a different type for encoding (say class BinaryString < String). Now I can vaguely make it work; type_cast runs decode64 on Strings and doesn't touch BinaryStrings, and serialize runs the encode64 on BinaryStrings. But this has two problems:

  1. bar = BinaryString.new('bar') ; model.foo = bar; model.foo == bar is false since type_cast is called by the getter (so model.foo will run decode64 on 'bar').
  2. I have to know to cast the value in my assignment.

Working approach: add another layer of accessors (foo= and its foo). The setter wraps the value in BinaryString and then calls the setter generated by attr (say binary_foo=). The binary_foo accessors work like in the second approach; the foo getter just delegates to the binary_foo getter.

Now foo= and foo work symmetrically

    wack = Example.new(hash_key_field: 'x')
    wack.foo = 'blah'
    assert(wack.foo == 'blah') # => true
    wack.save(force: true)
    amole = Example.find(hash_key_field: 'x')
    assert(amole.foo == 'blah')  # => true
    assert(wack.foo == amole.foo) # => true

But I'm copying and force_encoding strings, and even then we're only getting dirty tracking because I'm eager (and encode as soon as the assignment is made).

Probable next approach if I can take the time to reopen the code: write the case statements in terms of the encoding instead of the value's class. That breaks the pattern established by Aws::Record::Marshaler subclasses, but might allow removal of the second layer of accessors.

Other ORMs like Active::Record and DataMapper just re-use the pattern from Ruby's Marshal module: two conceptually decoupled methods called #dump and #load. The only assumption is that random_marshaller.load(random_marshaller.dump(x)) == x (and even that's a bit loose; it wouldn't be unusual for a symbol to be turned to a string). The attribute handling just lets x be whatever value it was before it was assigned.

Thanks for your attention.

awood45 commented 8 years ago

I had a rough draft of binary types that appeared to work for basic cases, I'll take a look at your use case once we pull this up. Definitely happy to prioritize a PR review as well.